Explainability and Interpretability in Concept and Data Drift: A Systematic Literature Review

Pelosi, Daniele; Cacciagrano, Diletta; Piangerelli, Marco

doi:10.3390/a18070443

Open AccessSystematic Review

Explainability and Interpretability in Concept and Data Drift: A Systematic Literature Review

by

Daniele Pelosi

¹

,

Diletta Cacciagrano

¹

and

Marco Piangerelli

^1,2,*

¹

Computer Science Division, School of Science and Technology, University of Camerino, Via Madonna delle Carceri 7, 62032 Camerino, Italy

²

Vici & C. S.p.A., Via Gutemberg 5, 47822 Santarcangelo di Romagna, Italy

^*

Author to whom correspondence should be addressed.

Algorithms 2025, 18(7), 443; https://doi.org/10.3390/a18070443

Submission received: 29 May 2025 / Revised: 5 July 2025 / Accepted: 14 July 2025 / Published: 18 July 2025

(This article belongs to the Special Issue Machine Learning for Pattern Recognition (3rd Edition))

Download

Browse Figures

Versions Notes

Abstract

Explainability and interpretability have emerged as essential considerations in machine learning, particularly as models become more complex and integral to a wide range of applications. In response to increasing concerns over opaque “black-box” solutions, the literature has seen a shift toward two distinct yet often conflated paradigms: explainable AI (XAI), which refers to post hoc techniques that provide external explanations for model predictions, and interpretable AI, which emphasizes models whose internal mechanisms are understandable by design. Meanwhile, the phenomenon of concept and data drift—where models lose relevance due to evolving conditions—demands renewed attention. High-impact events, such as financial crises or natural disasters, have highlighted the need for robust interpretable or explainable models capable of adapting to changing circumstances. Against this backdrop, our systematic review aims to consolidate current research on explainability and interpretability with a focus on concept and data drift. We gather a comprehensive range of proposed models, available datasets, and other technical aspects. By synthesizing these diverse resources into a clear taxonomy, we intend to provide researchers and practitioners with actionable insights and guidance for model selection, implementation, and ongoing evaluation. Ultimately, this work aspires to serve as a practical roadmap for future studies, fostering further advancements in transparent, adaptable machine learning systems that can meet the evolving needs of real-world applications.

Keywords:

explainability; explainable AI; interpretability; interpretable AI; concept drift; data drift

1. Introduction

In recent years, with the rapid growth of artificial intelligence (AI) and machine learning (ML), the concepts of explainability and interpretability have taken on an increasingly central role, becoming in effect a hot topic. This growing attention has led to the development of two related but distinct branches of AI research: explainable AI (XAI), which refers to techniques that provide external, often post hoc, explanations for the behavior of complex black-box models, and interpretable AI, which focuses on models that are inherently understandable by human users due to their transparent structure. Although the terms are sometimes used interchangeably in the literature, in this review we follow the more precise distinction formalized in Section 2.1 (Definitions 5 and 6). This clarification is critical, particularly when analyzing how these two approaches intersect with concept and data drift. In this review, our focus is placed specifically on how both explainability and interpretability contribute to understanding, monitoring, and adapting to drift phenomena. Given the rapid changes in areas like IoT, healthcare, etc., researchers are increasingly interested in uncovering the underlying reasons for these drifts, enabling more effective adaptation and more reliable AI systems. Since interest in this field has surged only in the last few years, there is a clear lack of systematic literature reviews (SLRs) addressing these concerns. This review aims to fill that gap with a clear distinction and taxonomy of explainability and interpretability techniques, evaluated in the context of drift. The few existing surveys are not systematic literature reviews, but rather more traditional literature reviews. There are eight such works in total: one is in Chinese [1], which makes it less accessible to a broader audience, and four others focus predominantly on the medical domain [2,3,4,5]. In contrast, Cai and Sun provided a review of the literature on available techniques for detecting existing deep learning approaches, bridging research gaps in anomaly detection for IoT surveillance systems [6]. However, it provides no studies related to explainability or interpretability, explicitly stating that these aspects should be investigated in the future. Similarly, both Hu et al. and Sharief et al. gave out detailed discussiosn on the concept of drift, not focusing on explainability or interpretability [7,8]. Therefore, the aim of this SLR is precisely to fill this gap. This SLR seeks to provide a taxonomy that describes and enumerates all current strategies, methodologies, and models in the scientific landscape for addressing this topic. To the best of our knowledge, this is the first review that presents comprehensive tables consolidating the results of the various models and benchmarks found in the highly fragmented literature. Our hope is that this collected and distilled knowledge will serve as a valuable resource and reference for researchers, equipping them with a consultable tool to make the right choices based on their specific circumstances.

In this review, we pose two complementary sets of research questions. The first set, General Questions (GQs), gives a high-level view of the literature, whereas the second set, Specific Questions (SQs), drills down into the technical details of the proposed approaches. The General Questions are

GQ1.: Trend: How has research on explainability and interpretability for concept and data drift evolved over time?
GQ2.: Domain: Which application areas most frequently adopt these techniques?
GQ3.: Publication Venues: Which journals, conferences, or workshops exert the greatest influence in this field?
GQ4.: Research Groups: Which institutions or teams are the most prolific contributors?

The Specific Questions (SQs) are

SQ1.: Online vs. Offline Learning: How do online learning approaches differ from offline (batch) learning when coupled with explainability?
SQ2.: Drift Characteristics: How do concept versus data drift and sudden, gradual, incremental, or recurring drift affect the design of explainability methods?
SQ3.: Datasets and Benchmarks: Which datasets are most often used for evaluation?
SQ4.: Models: What model families (e.g., tree-based, neural, GNNs) are paired with explainability in drift scenarios?
SQ5.: Metrics: Which quantitative metrics capture gains in interpretability, user trust, decision quality, or regulatory compliance?
SQ6.: Industrial Adoption: In which real-world sectors have explainable drift detection methods been deployed, and what evidence demonstrates benefits such as reliability, maintenance efficiency, or compliance?
SQ7.: Stakeholder Perception: What types of users, domain experts, data scientists, or end-users benefit most from these techniques, and how do they derive opportunities from them?
SQ8.: Online Adaptation of XAI: How are traditional XAI methods—designed for static models—adapted to online learning environments to maintain interpretability under concept/data drift?
SQ9.: Local Drift Granularity: To what extent do XAI methods enable fine-grained analysis of local drift characteristics (instance-level or subgroup shifts) compared to global drift detection?
SQ10.: Challenges and Limitations: What major obstacles, side effects, or scalability issues arise when integrating drift detection with explainability?
SQ11.: Future Directions: Which research avenues and interdisciplinary approaches are proposed to overcome current limitations and foster wider adoption?

The responses to the GQs and SQs are integrated into the following discussion to offer a detailed overview of the current state of the field. The structure of the paper is delineated as follows: After the Introduction, Section 2 introduces essential definitions and concepts foundational to drift phenomena and their explainability. Section 3 elaborates on the methodology employed for collecting the papers. Then, Section 4 aims to provide an empirical account of the rank inclusion and exclusion process, while Section 5 presents a comprehensive overview of the papers selected for review. Section 6, which constitutes the core of this review, examines the various technological and methodological approaches utilized for the explainability and interpretability of drift, and presents our taxonomy. A thorough discussion of the findings and responses to the previously mentioned research questions is contained in Section 7. Finally, in Section 8, conclusions are drawn.

2. Background: Drift Phenomena and Explainable/Interpretable AI

This section introduces some basic definitions and notions underlying the study of the topic of this review.

2.1. Definitions and Notations

Definition 1.

(Data Stream): It is an ordered sequence of data points, typically indexed by time and potentially unbounded. Formally, a data stream can be represented as

S = (z_{1}, z_{2}, z_{3}, \dots)

, where each element

(z_{i} \in Z)

for some domain

(Z)

arrives at time

t_{i}

. We assume that each

z_{i}

is generated by an underlying probability distribution

(D_{t_{i}})

that may vary with time. In other words, a data stream is a sequence

({(z_{i}, t_{i})}_{i \geq 1})

with

t_{1} < t_{2} < \dots

, and it allows only single-pass, incremental processing of its elements due to its continuous, potentially infinite nature [9,10].

Definition 2.

(Drift): It refers to any systematic change in the data stream’s statistical properties over time. Given a data stream

(S = {(z_{i})}_{i \geq 1})

and letting

D_{t}

denote the probability distribution of data at time t, a drift is said to occur if there exist two time points

(t_{1} < t_{2})

such that

D_{t_{1}} \neq D_{t_{2}}

. In other words, the data-generating process is nonstationary, and the probability law governing observations in the stream changes at some point [11].

Definition 3.

(Data Drift): It is a specific type of drift concerning changes in the distribution of the input data over time, without necessarily altering the underlying concept that maps inputs to outputs. It is also known as covariate shift or virtual drift. Formally, consider a supervised data stream of input–output pairs

({(x_{i}, y_{i})})

, where

x_{i} \in X

represents a feature vector and

y_{i} \in Y

represents labels [11]. Data drift occurs when the marginal feature distribution changes; i.e., there exist times

(t_{1}, t_{2})

such that

P_{t_{1}} (X) \neq P_{t_{2}} (X)

Definition 4.

(Concept Drift): It is a specific type of drift that affects the conditional distribution of the output given the input and thus alters the model’s true mapping from

X

to

Y

over time [11]. Concept drift can manifest as a shift in class boundaries, changing correlations between features and the target, or a complete change of the task’s true target function [12]. Formally, in a supervised stream with a joint distribution

(P_{t} (X, Y))

at time t, concept drift means there exist times

(t_{1}, t_{2})

such that

P_{t_{1}} (Y ∣ X) \neq P_{t_{2}} (Y ∣ X)

Definition 5.

(Explainability): It is the ability of a predictive model or system to provide intelligible explanations for its behavior or decisions [13]. An explanation is typically an interpretable description of the internal reasoning or the key factors that led to a particular outcome. Formally, consider a model

(f : X \to Y)

. We say that f is explainable if there exists an explanation function

(ξ)

mapping the model’s input–output pairs to a human-interpretable domain

(H)

(such as natural language statements, logical rules, or visual indicators),

ξ : X \times Y \to H,

such that for any input

(x \in X)

with a prediction

(y = f (x))

, the output

(ξ (x, y) \in H)

articulates why f arrived at that decision. In simpler terms, explainability means the model can communicate the reasons for its predictions in understandable terms [14].

Definition 6.

(Interpretability): It is the degree to which a human can understand the cause of a decision or the workings of a model [15]. It is an intrinsic property of the model that describes how transparently its internal mechanics can be understood in human terms [16]. A model is interpretable if its structure and parameters are such that a user can directly follow the logic of how inputs are transformed into outputs [17]. Formally, one may view interpretability as the existence of a mapping from the model

(f)

to a simpler, human-comprehensible representation

(g)

that is faithfully equivalent or close to f in the relevant data distribution.

2.2. Drift Overview

Given a data stream

(S = {\{(x_{t}, y_{t})\}}_{t = 1}^{\infty})

, drift denotes any time-dependent departure of the stream’s data-generating distribution from the one that prevailed when a predictive model was trained. Formally, let

(D_{t})

be the joint distribution

(P_{t} (X, Y))

at time

(t)

. Drift occurs iff

(\exists t_{1} < t_{2} : D_{t_{1}} \neq D_{t_{2}})

. Such nonstationarity violates the classical i.i.d. assumption and can erode model performance, sometimes catastrophically. A change in

(D_{t})

can manifest in two principal ways.

2.2.1. Data Drift

As visible from a generated example in Figure 1, the blue histogram time

t_{0}

and the orange histogram time

t_{1}

portray two snapshots of the same input feature taken at different moments in the data stream. The entire distribution has translated to the right, from a mean around 0 to a mean near 2, while the overall spread (variance) remains comparable. This illustrates data drift:

P_{t_{1}} (X) \neq P_{t_{0}} (X), while P (Y ∣ X) is assumed unchanged .

Because the learned decision boundary was fitted on the blue distribution, it may extrapolate poorly on the orange one. In practical terms, the model is forced to make predictions in parts of the feature space it has no empirical support for, leading to increases in error, confidence mis-calibration, or even outright failures. Continuous monitoring of the marginal

(P (X))

, for example, with statistical tests or running histograms like the one in the figure, is therefore essential to trigger retraining or adaptation before performance degrades in production.

2.2.2. Concept Drift

As shown in Figure 2, the blue points, time

t_{0}

, lie roughly on an increasing linear trend: larger input values map to proportionally larger outputs. By time

t_{1}

(orange points), the input distribution is essentially unchanged; the cloud spans the same range on the x-axis, but the relationship between X and Y has rotated into a curved, non-linear shape. Formally,

P_{t_{0}} (X) \approx P_{t_{1}} (X) while P_{t_{0}} (Y ∣ X) \neq P_{t_{1}} (Y ∣ X) .

This is the hallmark of concept drift: the true decision surface moves even though the feature space the model sees stays put. A classifier or regressor trained on the blue relationship will project a straight decision boundary or regression line; when confronted with the orange regime, it will systematically mis-predict, for instance, by over-estimating outputs around

X \approx 0

and under-estimating them for larger

| X |

.

The concept drift literature distinguishes four temporal profiles, each posing different monitoring and adaptation requirements.

Sudden drift: The relationship between inputs and outputs changes instantaneously at a specific point in time, as shown in Figure 3. The data overview panel demonstrates two distinct regimes: an initial positive linear relationship, in blue, abruptly replaced by a negative linear relationship, in orange, after the change point. This discontinuity is quantitatively confirmed in the parameter evolution panel, where the slope coefficient m exhibits a step change from $2.0$ to $- 1.0$ at time index 500, with no transitional states.
Gradual drift: Both old and new concepts coexist during an extended transition period. Figure 4 shows four phases: an initial positive linear relationship (in blue) progressively gives way to a negative relationship (in orange) through intermediate mixed states (in green). Quantitatively, the slope coefficient linearly decreases from $2.0$ to $- 1.0$ between time indices 300 and 700, confirming a smooth transition without abrupt changes.
Incremental drift: The relationship between inputs and outputs undergoes continuous, stepwise evolution rather than an abrupt change or extended coexistence. As shown in Figure 5, the initial positive linear relationship (blue points) progresses through intermediate states (green points) to a final negative relationship (orange points), with each step showing a slight rotation of the regression line. This continuous transformation is quantitatively confirmed in the parameter evolution panel, where the slope coefficient linearly decreases from $2.0$ to $- 1.0$ across the entire time period without plateaus or abrupt jumps.
Recurrent drift: The system alternates periodically between distinct concepts, creating a cyclical pattern of change. Two stable relationships repeatedly alternate, as is visible in Figure 6: Concept A maintains a positive linear relationship (blue points), while Concept B follows a negative relationship (orange points). The parameter evolution panel confirms this periodicity through a square-wave pattern, where the slope coefficient alternates precisely between $2.0$ and $- 1.0$ every 250 time steps. This recurrent mechanism is clearly visible in both the full-period visualizations and in the square-wave slope pattern.

2.3. Explainable AI

Explainable artificial intelligence (XAI), as we have described so far, refers to methods and techniques that make the outputs, decisions, and internal workings of artificial intelligence models understandable and interpretable to human users. In what follows, we first clarify the concept of the level of explainability. With this dimension in place, we then examine the widely adopted techniques.

2.3.1. Explainability Scope

What portion of a model’s behavior are we trying to make transparent? The literature converges on two complementary scopes: global and local [18].

A global explanation attempts to convey the broad logic that governs all of a model’s predictions. It compresses the mapping $f : X \to Y$ into a form that remains faithful, at least approximately, over most of the input space. Global summaries may disguise important minority patterns because they average behavior across the whole population [19].
A local explanation zooms in on a single instance, or at most a small neighborhood of similar points, and answers the question, “Why did the model produce this particular outcome?” Local narratives are precise but narrow; they tell us nothing about how the model behaves elsewhere, and they can fluctuate if the decision boundary is highly non-linear [20].

2.3.2. Classical Approaches

The following presents a selection of the most prevalent explainable AI techniques that can be applied to both local and global contexts. While this overview encompasses widely adopted XAI methods identified in the literature, it is not exhaustive, as comprehensive coverage of all explainability approaches would extend beyond the intended scope of this section. All the example charts here, except for LIME, are taken from Molnar’s book [21].

SHAP (Shapley Additive Explanations): This method assigns each feature an importance value derived from Shapley values in cooperative game theory. The model’s prediction is treated as a “payout” fairly divided among features, ensuring that attributions sum up precisely to the difference between the actual and baseline outputs. It supports both global and local explanations.
Figure 7 shows a SHAP summary plot for a random forest classifier trained with 100 trees to predict the sex of penguins. Every dot in the figure is a Shapley value; its vertical position marks the feature, its horizontal position represents the size and direction of that feature’s contribution for one bird, and its color encodes the raw feature value from low (blue) to high (red). The cloud of points therefore conveys two things simultaneously: the importance ranking (features are ordered from top to bottom by the spread of their dots) and the effect pattern (for example, higher body-mass values cluster on the negative side, showing that heavy birds push the model away from “female”). Because overlapping dots are jittered vertically, the reader also gets a sense of the distribution of effects within each feature. In one compact visual, novices can see that body_mass_g dominates the decision process, followed by bill_depth_mm and bill_length_mm, and can infer how low versus high measurements shift the predicted probability [21].
LIME (Local Interpretable Model-agnostic Explanations): This method explains individual predictions by training a simple surrogate model locally around the instance. This surrogate approximates the black box locally, highlighting influential features for a specific prediction. LIME has been designed to be applied locally.
In Figure 8 this procedure is illustrated for a random forest classifier trained on a Diabetes dataset. The forest assigns a 0.73 probability of diabetes to a particular patient. LIME perturbs that patient’s eight clinical attributes (glucose, BMI, age, etc.), reevaluates each variant with the forest, and fits a weighted linear surrogate whose $R^{2}$ against the forest’s outputs is 0.92—evidence of high local fidelity. The bar chart shows that elevated glucose and body mass index are the dominant positive contributors, while a slightly below-average blood pressure exerts a small negative influence [17].
Partial Dependence Plots (PDP): These show the marginal effect of varying one or more features on the model output while averaging over the joint distribution of all other features. The resulting curve (or surface) visualizes whether the relationship is linear, monotonic, or more complex. PDPs assume the features being varied are independent of the others; violations can lead to misleading interpretations. They provide a global view.
Figure 9 illustrates the idea for a random forest regressor trained to predict the number of bikes rented on a given day. Each panel depicts the partial dependence of the prediction on a single weather variable. The left-hand plot shows temperature: predicted rentals climb sharply as the day warms, level off at around 20 °C, and dip slightly beyond 30 °C, indicating that the forest has learned that very hot days discourage cycling. The middle plot varies humidity and shows a flat response up to roughly 60%, after which predicted demand falls—consistent with would-be cyclists being deterred by muggy conditions. The right-hand plot varies wind speed; rentals decline steadily up to about 25 km $h^{- 1}$ and then appear to stabilize, but the rug marks along the x-axis reveal few training examples at higher wind speeds, reminding readers that the apparent plateau may be an artifact of sparse data. By presenting these three curves side by side, the PDP makes the random forest’s global weather-related reasoning transparent without requiring inspection of individual trees [21].
Accumulated Local Effects (ALE): This method improves on the PDP when features are correlated. The feature range is partitioned into data-driven intervals, local gradients are computed within each slice using only observed data points, and these gradients are accumulated into a centered curve. ALE therefore avoids the extrapolation bias of PDPs and remains computationally light. It is a global explanation method.
Figure 10 illustrates the idea with a random forest model that predicts the number of bikes rented each day from weather variables. In the left panel, the ALE curve for wind speed dips steadily below zero, showing that stronger winds depress rental counts; the rug at the bottom reminds the reader that very high wind values are rare, so the slight uptick beyond 30 km $h^{- 1}$ is probably noise rather than a true reversal. The right panel reports the categorical main effect for the “weather situation”: relative to the baseline of good weather, misty conditions reduce demand modestly, whereas bad weather slashes expected rentals by more than six hundred bikes—both findings align with common sense. Because ALE can also be extended to two-way interaction surfaces, analysts can inspect, for example, how humidity modifies the temperature effect without re-plotting the main trends; the interaction plot shows only the additional joint influence, leaving the main effects undisturbed and easy to read [21].
Permutation Importance (PI): This quantifies how strongly a model depends on each input variable by randomly shuffling that variable in a validation set, keeping the model weights fixed, and measuring how much the chosen error metric worsens. If performance metrics like mean absolute error or accuracy remain stable after permutation, the feature is redundant; substantial drops indicate critical importance. It offers a purely global view of importance.
Figure 11 makes this idea concrete for a support vector machine regressor that predicts daily bike rental counts from weather and calendar information. After every feature is permuted 100 times, the drop in mean absolute error is averaged (black dot) and a 5th–95th-percentile range is plotted as a horizontal bar. The variable cnt_2d_bfr—the number of bikes rented two days before—causes by far the largest performance loss and therefore tops the ranking, followed by temperature, humidity, and wind speed. Calendar variables such as holidays contribute little: shuffling their values barely nudges the error. Reading the plot is simple: the farther a dot lies to the right and the longer its bar, the more the model would suffer if that feature were unreliable, providing an intuitive yard-stick for novices who have never inspected raw support vector weights [21].
Surrogate Models: Training proceeds by fitting an interpretable model—not to the true labels but to the predictions of a black box—so that the surrogate reproduces the original decision surface as closely as possible. The resulting tree offers a global sketch of the black box, while any root-to-leaf path can be read as a local, rule-based explanation.
Figure 12 shows a CART surrogate that mimics a support vector machine (SVM) trained to forecast daily bike rental counts from weather and calendar features. Using the original training data, the tree is fitted to the SVM’s outputs and attains an $R^{2} = 0.76$ on hold-out cases, indicating a good—though not perfect—approximation of the black-box behavior. The first split is on cnt_2d_bfr (rentals two days earlier); when this lagged count exceeds 4570, the model anticipates higher demand, and a subsequent split on temperature (>13 °C) refines the prediction further. The four terminal nodes are summarized by box-plots: moving from the top-left panel (cool day, low prior demand) to the bottom-right panel (warm day, very high prior demand), the median predicted rental count climbs sharply, visualizing in one glance how the two variables interact inside the SVM. By walking each path, a reader can translate the black-box logic into plain IF–THEN statements, while the overall structure of the tree conveys which features the SVM relies on most [21].
Anchors: The model learns concise IF–THEN rules (anchors) for a target instance such that, with high empirical precision, any sample satisfying the rule yields the same prediction. Precision and coverage statistics accompany each rule, giving users a quantified guarantee of stability. Anchors supply local explanations centered on the rule’s coverage region.
Figure 13 makes this concrete for a random forest classifier built from the bike rental dataset. The regression target (daily rental count) has been binarized into above versus below the trend line, and the forest is asked to justify six individual predictions. For every case, the anchor algorithm fixes one or two feature predicates—here mainly temperature buckets such as temp $ϵ [14, 21)$ , occasionally augmented by weather = bad—and then generates synthetic neighbors by sampling the remaining features from the training set. If at least 95% of those neighbors inherit the forest’s original label, the predicate set is accepted as an anchor. In the plot each horizontal bar lies on a precision axis from 0 to 1: the right-hand end of every bar touches or exceeds the $0.95$ threshold, signaling that the discovered rule is highly faithful. The bar thickness encodes coverage; the thickest rules apply to over 20% of comparable days, while thinner bars cover only a sliver of the data space. That most anchors rely solely on temperature confirms, in an immediately readable way, that the classifier’s decision boundary is dominated by this single variable, with bad weather entering the rule set only when it is needed to secure high precision [21].
Counterfactual Instances: The algorithm searches for the minimum change(s) to the input features that would flip the model’s prediction to a specified target class. The optimization balances prediction flip, proximity to the original instance, and optional feasibility constraints. The resulting counterfactual highlights actionable levers and their associated “cost.” Counterfactuals are inherently local.
Table 1 and Table 2 illustrate the idea for a radial-basis SVM trained on the German Credit dataset. The customer described in Table 1 receives only a 24.2% probability of good credit. A multi-objective counterfactual search then generates ten nondominated alternatives, reported in Table 2. Every viable solution shortens the loan duration from 48 months to roughly 23 months; many also upgrade the job category from “unskilled” to “skilled,” and several switch the recorded gender from female to male. All ten counterfactuals raise the predicted probability above the 50% threshold, but they differ in how many features they modify and how far they stray from the training manifold, offering a spectrum of trade-offs between realism and efficacy. The repeated gender flip, coupled with large probability jumps, additionally exposes a bias latent in the SVM—an insight that would remain hidden without counterfactual analysis [21].

2.4. Putting Them Together—Drift Explainability

While the individual concepts of ’drift’ and ’explainability’ are well-established in the machine learning literature, their intersection—specifically, the explainability of drift—lacks a formal, consensus definition. Current research often treats explainability as an auxiliary component of drift detection systems rather than a structured epistemological framework. This gap persists despite a proliferation of methodologies addressing interpretable drift adaptation, as cataloged in our taxonomy (Section 6). Through systematic analysis of these approaches, we observe that the explainability of drift is not merely the application of post hoc XAI techniques to drifted models, nor is it reducible to drift detection metrics. Instead, it constitutes a distinct capability: the dynamic translation of distributional shifts into human-actionable narratives intrinsically generated by the adaptation process itself. We therefore propose the following synthesizing definition, grounded in patterns observed across methodological classes: “Explainability of drift constitutes the integrated capability of adaptive systems to transform detected changes—whether in data distributions (data drift) or input–output mappings (concept drift)—into human-interpretable narratives that intrinsically articulate why, where, and how the drift manifests, while projecting its consequences. Unlike post hoc interpretation of static models, this process embeds rationale generation directly within drift adaptation mechanics, leveraging evolving artifacts—geometric displacements in latent representations, mutations in interpretable rule structures, or domain-anchored semantic tags—to translate statistical nonstationarity into auditable, actionable insight.” For example,

When sensor networks detect hydraulic anomalies, explainability moves beyond divergence metrics to reveal—through migrating prototype clusters or vanishing fuzzy clauses—that valve corrosion in Pump P402 triggered the shift;
When fraud models lose accuracy, feature attribution timelines expose geolocation mismatches eclipsing transaction amounts as the evolving rationale.

Crucially, these explanations are not ancillary outputs but the substance of adaptation itself: counterfactual recommendations harden against invalidating future drifts; compliance tags chronicle regulatory assimilation; and SHAP vectors crystallize into early-warning dashboards. By tethering abstract change to tangible mechanisms—corroded hardware, emergent cyber-tokens, or typhoon-altered ocean stratification—explainability bridges detection to intervention, enabling stakeholders to interrogate not merely when a model fails, but why its logic reconfigured and how to reconcile it with reality. This seamless fusion of quantification and narrative, grounded in domain semantics, fulfills the dual mandate of XAI in nonstationary environments: making drift intelligible and actionable. Thus, our definition positions explainability not as an add-on to detection but as the mechanism through which drift becomes legible, auditable, and actionable—a paradigm shift evident in techniques ranging from evolving fuzzy rule structures to geometrically anchored attribution timelines.

2.5. Interpretable AI

While explainable AI (XAI) encompasses diverse methodologies, to generate human-understandable rationales for a model’s behavior, interpretable AI fundamentally differs: interpretability is an intrinsic property of the model itself, not an outcome of applied techniques. Models are interpretable when their internal structure—such as parameters, logic flows, or decision pathways—is inherently transparent and directly comprehensible to humans. For example, Decision Trees or linear models are intrinsically interpretable: their architecture enables users to trace input-to-output reasoning through direct inspection, without relying on external explanatory methods. Thus, interpretability arises from model design choices (e.g., simplicity, modularity, transparency), not from external techniques that act as wrappers around a model. This distinction clarifies why interpretability lacks a “toolbox” akin to XAI: it is a foundational characteristic of the model architecture.

3. Systematic Literature Review Methodology and Selection Process

This systematic literature review draws on four leading scholarly databases:

Scopus;
Web of Science;
IEEE Xplore;
ACM Digital Library.

They were selected for their broad disciplinary coverage and established reputations. To capture the fullest possible set of relevant studies while maintaining academic rigor, we used each platform’s Advanced Search interface and executed the unified search query shown in Listing 1:

Listing 1. Unified search query employed across databases.

1: (explainability OR explainable OR "explainable ai" OR interpretability)
2: AND
3: ("data drift" OR "concept drift" OR "data shift")

Because the review centers on the intersection of explainability/interpretability and drift, the first clause targets at least one of the former terms, while the second clause anchors the search in drift-related phenomena. Syntax was adapted to each database, and searches were confined to the Abstract field to keep the result set manageable; expanding to keywords, full text, or even references would have produced an unmanageably large corpus. No publication-date restrictions were applied. As the next section shows, research in this area has only begun to flourish in recent years; introducing a temporal cutoff would therefore have unduly narrowed the search and yielded too few studies. The multi-stage selection process is summarized in the SIGMA flow diagram in Figure 14 and detailed below:

1.: Database Search: The query was executed in all four databases, and the results were imported into Zotero.
2.: Duplicate Removal: Duplicate records were identified and deleted.
3.: Document-Type Filter: Reviews, books, and book chapters were excluded so that only primary research in the form of journal articles and conference papers remained.
4.: Source-Quality Filter: To ensure a baseline of methodological quality, we discarded papers from rank-C conferences, whereas journal articles from all quartiles were retained because a lower quartile does not in itself compromise scientific validity.
5.: Title and Abstract Screening: The surviving titles and abstracts were then examined manually, eliminating studies that were clearly out of scope.
6.: Language Filter: Only English-language publications were included to guarantee full comprehensibility.
7.: Data Extraction: Relevant data were extracted from the included studies and analyzed in order to better discuss the approach and present the most common datasets and models for each identified field.

The initial search retrieved a total of 239 records as follows: 122 from Scopus, 68 from Web of Science, 25 from IEEE Xplore, and 24 from the ACM Digital Library. Removing 103 duplicates left 136 unique studies. Eleven of these were reviews, books, or book chapters, reducing the total to 125. Venue-based filtering then excluded 16 papers from rank-C conferences and 7 from unranked venues, yielding 102 candidate studies. During title and abstract screening, 10 papers were removed because, although they mentioned explainability or interpretability, they did not substantively connect these concepts to mechanisms for explaining data or concept drift, a fundamental inclusion requirement. Full-text retrieval for the remaining 92 studies led to the exclusion of one article written predominantly in Chinese. Each of the 91 English-language papers was subsequently read independently by all three authors. Nine were rejected at this stage because they still failed to meet the drift-explainability criterion. Baniecki et al. [22] describe only a post hoc XAI wrapper and offer no drift-diagnostic technique. The article by De Lemos et al. [23] is a position paper that identifies the need for explainability in self-adaptive systems but proposes neither a drift model nor an explanatory approach. Abid et al. [24] mention interpretability only tangentially in an IoT-monitoring context, without addressing drift. Casado et al. [25] and Hosseinalipour & Duel-Hallen [26] focus on adaptation architectures for federated learning and do not attempt to explain why distributional shifts occur. Lai et al. [27] deal with generative reporting of ML defects; interpretability of drift is peripheral. Kleinert et al. [28] postpone the use of XAI to future work, and Ouatiti et al. [29] show that static explanations degrade under drift but propose no drift-specific explanatory method. After all exclusion steps, the final corpus comprises 82 primary studies, which form the evidential basis of this review.

4. Ranking-Based Criteria for Paper Inclusion

Venue quality played a decisive role in the selection of primary studies, and we treated conference proceedings and journal articles separately to respect the evaluation standards used in each community. For conference papers we relied on the CORE/ICORE ranking system, an internationally recognized classification that groups venues into four tiers:

A^{*}, A, B,

and C. Rankings were retrieved from the official ICORE (ICORE: https://portal.core.edu.au/conf-ranks/?search=icphm&by=all&source=CORE2023&sort=atitle&page=1, accessed on 29 May 2025) portal and the complementary comank (comank: https://comrank.top/, accessed on 29 May 2025) database. To uphold a baseline level of methodological rigor, every paper appearing in a C-rank conference was discarded, whereas all papers from

A^{*}, A,

and B venues were allowed to proceed to the next screening stage. Journal articles were assessed through the quartile system (

Q 1

–

Q 4

) provided by SCImago Journal Rank (SCImago: https://www.scimagojr.com, accessed on29 May 2025) (SJR) and cross-checked in Scopus (Scopus: https://www.scopus.com/sources, accessed on 29 May 2025). Unlike the policy for conferences, we opted to keep articles from all four quartiles. Two considerations motivated this choice: first, the topic under review is still emerging and the initial pool of papers is relatively small; second, publication in a

Q 4

journal does not automatically imply weak methodology, especially in fast-growing research areas where newer journals have yet to accumulate citations.

To maintain full transparency and to preempt any concerns about the scientific soundness of the material analyzed, Table 3 and Table 4 list every study that passed this filter alongside its corresponding conference rank or journal quartile.

The label

N o t F o u n d

marks conference papers whose venue ranking could not be verified in CORE, CoMank, or any of the other ranking portals consulted. Such cases usually correspond to very recent conferences that have not yet been evaluated or to events with limited visibility. Because, at the time this SLR is being written, the scientific standing of these venues cannot be established, the associated papers were excluded from the review. Finally, note that the two tables report every record retrieved, not merely the studies that were ultimately analyzed.

5. Bibliometric Overview of Retrieved Publications

This section provides a general overview of the retrieved records, specifically analyzing 136 records obtained after removing duplicates. This broader inclusion criterion was chosen because the available material is somewhat limited, and excluding results based on previously established inclusion/exclusion criteria would not significantly impact the current analysis. The analysis focuses primarily on bibliographic information. The diagram shown in Figure 15 highlights how interest in the explainability and interpretability of drift has increased over time, with a notable growth of approximately

80 %

from 2020 to 2024. This upward trend suggests a continued expansion of the topic in the upcoming years.

Figure 16 presents the distribution of corresponding authors by country, providing insights into the global scope of research activities on this topic. The majority of research originates from China, Germany, and the USA, countries known for rapidly expanding industries. This distribution underscores the international relevance of the topic, especially within leading industrial nations, and emphasizes that explainability and interpretability of drift are not niche subjects but globally debated topics. Both Figure 15 and Figure 16 were developed using Matplotlib [147].

Analyzing publication sources and reporting them in Table A1 reveals that articles addressing explainability and interpretability of concept and data drift appeared across 52 different journals. Among these, only four journals contain more than one publication within the collected records. This fragmented distribution, predominantly within computer science journals, likely results from the novelty of the research area, the absence of specialized journals explicitly dedicated to these topics, and the broad relevance of drift explainability across various computer science subfields, including machine learning, data mining, and artificial intelligence. As the field matures, we anticipate greater consolidation around a smaller number of key journals.

Similarly, Table A2 lists all conferences and workshops where the bibliography records were published. Mirroring the journal distribution, we identified 52 distinct conferences or workshops featuring papers addressing the explainability and interpretability of concept and data drift. Among these, 10 conferences stand out for hosting multiple publications within the collected records. Notably, three conferences—BPM, CIKM, and ECML PKDD—each featured three publications, followed by seven additional conferences with two publications each. The overwhelming majority of these conferences focus on computer science topics with broad applicability, with the International Petroleum Conference being the sole exception. Furthermore, the table highlights the global scope of these conferences, spanning Asia, Australia, Europe, and international venues, reinforcing the homogeneous geographic distribution of research on this topic, as previously illustrated in the pie chart in Figure 16.

To assess the extent of researcher collaboration, a co-authorship network was created using VOSviewer software [148]. Figure 17a visualizes authors as nodes and their collaborations as connections, revealing distinct research clusters. Node colors represent authors’ average publication years, enabling an analysis of how research groups have evolved over time. Two key insights emerge from this visualization: first, the numerous clusters point to significant fragmentation within the research community; second, the prevalence of nodes colored yellow suggests that many research groups have recently formed and thus exhibit relatively low average publication frequencies, a finding consistent with the observed upward trend in interest. Nevertheless, there are also clusters showing considerable collaboration, primarily associated with somewhat older publications. Two prominent clusters are expanded and detailed further in Figure 17b.

6. Taxonomy

This section surveys the explainability and interpretability techniques applied to models operating under data or concept drift. To organize the discussion, we classify the primary studies by the technical and methodological foundations underlying each approach. An application-domain taxonomy was considered initially but abandoned after a close reading of the corpus revealed that most papers are intentionally domain-agnostic or rely on synthetic benchmarks; grouping by domain would therefore have produced arbitrary and sparsely populated categories. A technique-oriented taxonomy better suits the scientific focus of this review and enables a clearer, method-oriented comparison. Techniques are further organized into five overarching macro-classes. Statistical and Probabilistic Reasoning includes methods based on statistical measures, divergences, and probabilistic frameworks to identify and explain drifts. Feature Importance and Attribution covers approaches that attribute significance to specific features, examples, or structural elements, explicitly showing how changes in these components contribute to drift. Interactive and Human-Centric Approaches encompasses techniques emphasizing human collaboration and practical decision-making contexts for drift interpretation. Logic and Rule-Based Methods comprises methods leveraging logical, causal, and fuzzy rules to clarify the nature of detected drifts. Finally, Evaluation and Auditing focuses on methods for assessing changes in model performance, fairness, and compliance, underscoring the necessity of transparent and justifiable drift explanations. These macro-classes group methods based on shared conceptual and methodological themes, ranging from statistical measurement of drift phenomena to human-centered interpretative frameworks. A conceptual representation of these macro-classes, highlighting their hierarchical and methodological connections, is illustrated in Figure 18. The representation is realized as a phylogenetic tree, created using a Miro board. While the macro-classes provide conceptual organization, the detailed methodological analysis and comparative evaluation presented in this review are specifically conducted at the level of fifteen micro-classes. Each subsection begins with a detailed analysis of the techniques used for explaining and interpreting the drift within the papers of each class. This is complemented by a tabulated summary of public datasets used in the cited studies. The section culminates in a multidimensional class evaluation that probes the collective utility and limitations of the techniques through six interconnected perspectives. First, the analysis interrogates the type of drift addressed, distinguishing between data drift and concept drift (and its subtypes). Building on this, it examines application contexts where domain-specific imperatives shape methodological priorities. The scope of each approach is then scrutinized: does it pinpoint drift at the granular, instance-level (local) scale, or diagnose systemic shifts across the entire model (global)? This distinction informs the intended audience, as explanations must balance technical precision for data scientists with intuitive accessibility for end-users or domain experts. To ground these considerations, the analysis evaluates quantitative metrics that empirically validate claims. Finally, it assesses learning modes, probing whether methods align with real-time adaptation (online), retrospective batch analysis (offline), or hybrid frameworks that straddle both paradigms. Where a subsection contains papers sharing common datasets, direct comparisons of results and methodological trade-offs are highlighted. In the classification that follows, we explicitly indicate whether each methodological group or technique aligns with explainable AI (i.e., post hoc explanations) or interpretable AI (i.e., models with built-in transparency). For instance, classes relying on SHAP, LIME, or surrogate models are categorized as explainable AI, while those based on logical rules, fuzzy systems, or sparse interpretable structures fall under interpretable AI. This distinction is maintained throughout the class-by-class analysis to guide researchers in selecting methods appropriate to their domain, goals, and constraints. The classes are presented without strict precedence, additionally, it’s important to keep in mind that a paper may fit into multiple categories.

6.1. Distance/Divergence Geometry

The first class is composed of 14 out of 82 papers. Five of the fourteen are overlap papers; they use distance geometry alongside another primary mechanism.

6.1.1. Technical and Methodological Insights

The Distance/Divergence Geometry class collectively reimagines the concept drift explanation by transforming abstract statistical shifts into intuitive geometric narratives, where distances and divergences in latent or feature space become self-contained stories of change. Rather than treating detection and interpretation as separate processes, these methods embed explainability directly into their core artifacts, allowing analysts to “read” drift through evolving visual or quantitative geometry. At the heart of this approach lies a unifying principle: the same geometric construct that quantifies drift simultaneously explains it. For instance, Li et al.’s with their STAD framework, pairs latent-space distributions with auto-encoders, framing drift as a state-transition graph where symmetrized KL-divergence not only triggers jumps between states but also justifies them—reusing a past state, visualized through overlapping KDEs, instantly communicates seasonal recurrence, while spawning a new state signals novel behavior [130]. This collapses detection, adaptation, and rationale into a single, auditable timeline. Similarly, Wan et al.’s MCD-DD [37] converts Euclidean distances between concept vectors into heatmaps where the pattern of gaps—sudden flares, twin triangles, or fading diagonals—becomes a visual language for drift style. Practitioners recognize these shapes at a glance, trusting their geometric immediacy over traditional significance tests. The class further deepens interpretability by anchoring explanations in tangible, inspectable artifacts. Hu et al.’s [140] DSE leverages the evolving geometry of SVM decision boundaries: widening margins, miscolored points spilling across frontiers, and jumping density counters narrate drift in real time, while side-by-side “before/after” plots of boundary realignment show why a new classifier restores separability. Wood et al. [121] extend this to time series via Empirical Mode Decomposition, where fading amplitudes of interpretable IMFs (e.g., a weakening 24-hour sinusoid) directly link drift to behavioral shifts like flattened night-time consumption. Analysts trace forecast revisions to specific IMFs, turning feature selection logs into diaries of changing demand mechanics. Neufeld and Schmid [63] operationalize this by plotting multivariate hydraulic data as distance vectors against a median trace—spikes in specific channels (e.g., “Temperature 1”) immediately point to valve wear, transforming high-dimensional streams into fault maps. Here, the shape of the deviation is the explanation, requiring no auxiliary statistics. Later papers enhance this foundation by layering attribution on top of divergence metrics, explicitly linking global drift to localized feature-level changes. Xu et al.’s [122] CDDIA exemplifies this: KL-divergence between reconstruction-error histograms first locates drift via optimization masks that bucket samples into “outdated,” “valid,” or “shifted” categories; SHAP analysis then attributes the shift to specific components (e.g., pump P402), turning feature weights into a ranked anatomy of root causes. Bhaskhar et al.’s TRUST-LAPSE [96] similarly fuses Mahalanobis distance and cosine similarity into “mistrust scores” tied to inspectable exemplars—a surge in scores reveals why a sample deviates (e.g., pediatric EEGs entering an adult-trained model) by referencing nearest-neighbor prototypes from a coreset. This evolution—from detecting divergence to explaining its origin—reflects the class’s methodological maturation. Crucially, several frameworks treat dynamic geometry itself as the explanation. Migenda and Schenck’s [62] adaptive NGPCA reifies drift through jumps in local subspace dimensionality (

m j

): a cluster expanding from 10 to 13 components signals emergent operational complexity, with the eigenvalue recalibration loop visually justifying the model’s adaptation. Mascha’s [142] reconstruction-error histograms for manufacturing images tie pixel-space gaps directly to physical causes—a right-shifted histogram reveals lens blur or lighting drift, where the size of the gap corresponds to the deviation from the learned manifold. Unlike opaque metrics like softmax entropy, these geometric shifts are intrinsically interpretable: technicians inspect high-error images to see the over-exposed weld or blurred joint, grounding confidence scores in sensory reality. Underpinning all methods is a seamless fusion of quantification and interpretation. Whether through state graphs (STAD), heatmaps (MCD-DD), boundary plots (DSE), IMF stacks (Wood et al.), distance vectors (Neufeld & Schmid), or SHAP bars (CDDIA), each paper delivers a self-contained explanatory loop. Drift is not merely reported but demonstrated through artifacts that operators already monitor, eliminating the need for post hoc analysis. This transforms latent-space dynamics into auditable, domain-sensible narratives—where the magnitude of a divergence, the coordinates of a spike, or the amplitude of a component becomes the language through which drift is understood and acted upon. Therefore, the explanatory process is not inherent to the predictive model and aligns this class with the XAI group.

6.1.2. Datasets

The analysis of the selected papers shows that, in this group, there are no reference benchmark datasets. The publicly available datasets cited in the papers are listed in Table A3. We can observe that 31 of these public datasets are actually used only once, and there are even 3 papers that do not mention the use of any dataset at all. However, there are three datasets that are cited more frequently than the others, even if only twice. The first of these is CIFAR-10, which consists of 80 million color images in 10 classes, with 6000 images per class. FashionMNIST, on the other hand, contains grayscale images of clothing items from Zalando. Finally, CoverType is a biological dataset used to classify forest cover types based on cartographic variables. It is worth noting that the first two of these datasets are image datasets.

6.1.3. Multidimensional Class Evaluation

As visible from Table A18, this class predominantly addresses concept drift, with 11 papers covering various forms—sudden, gradual, incremental, recurrent, or general. Only two papers explicitly deal with data drift, suggesting a stronger orientation toward conceptual shifts rather than purely data-driven anomalies. The application contexts are notably diverse, spanning from manufacturing and healthcare to energy, cybersecurity, IoT, and environmental sensors. This diversity underscores the versatility and broad applicability of distance/divergence geometry methods across different domains. Healthcare and cybersecurity emerge as significant application contexts, indicating that these fields particularly benefit from the geometric interpretability offered by this mathematical approach. Regarding the scope, this group is balanced between local (instance-level) explanations and global (model-level) insights, although there is a slight emphasis on local interpretations (nine papers local vs. four global). This suggests that distance-based methods naturally lend themselves to pinpointing precise causes of drift at a granular level, which can be especially valuable for targeted diagnostic or corrective actions. In terms of audience, the primary target users of these methods are clearly data scientists, although a smaller subset is directed toward end-users, indicating that most explanations require technical literacy to fully interpret and leverage. Interestingly, only one paper explicitly specifies quantitative metrics through user-study methodologies. The absence of defined interpretability metrics across the rest of the papers might highlight a potential gap: this technique appears less suited for direct quantitative evaluation of interpretability gains, relying instead on visual or qualitative assessments. Finally, concerning the learning mode, there is a mixed approach with the notable usage of hybrid methods, online learning, and fewer offline methods. This indicates adaptability in different operational contexts and aligns well with real-world scenarios where data streams continuously evolve.

6.2. Change-Point and Statistical-Test Theory

Out of 82 papers, 6 employ Change-Point and Statistical-Test Theory as their main engine, while another four use it secondarily to set adaptive thresholds.

6.2.1. Technical and Methodological Insights

The “Change-Point and Statistical-Test Theory” class fundamentally reimagines drift detection as an exercise in narrative construction, where statistical rigor serves not merely to signal change but to articulate its context, cause, and consequences in domain-grounded terms. This paradigm emerges from a shared conviction: the mechanism of detection must inherently generate its own explanation. Vieira et al.’s [119] Driftage framework pioneers this ethos by transforming the ADWIN algorithm’s outputs into an auditable dialogue among specialized agents. Each decision—from the Monitor’s data ingestion to the Executor’s alarm dispatch—is stamped with forensic evidence: sensor IDs, adaptive sensitivity settings (like

δ

), and timestamps. Crucially, the Planner’s requirement for multi-sensor agreement does not just reduce false positives; it explains credibility through corroboration (“two muscles concurred”). This provenance culminates in visual narratives like parallel-coordinate plots, where a spike across specific channels (e.g., hamstring and thigh) instantly maps where behavior diverged. Even

δ

adjustments become part of the story—a shift from 1.0 to 0.1 transparently narrates rising noise levels. Thus, Driftage establishes the class’s core tenet: statistical parameters are not opaque thresholds but characters in a diagnostic drama. Building on this, Demšar and Bosnić’s ExStream [104] reframes the actual signal of drift as a shift in explanatory semantics. By replacing accuracy metrics with Shapley-style feature contributions, it detects drift when the geometry of influence reconfigures—say, “ODOR = almond” collapses while “GILL-SIZE = crowded” surges. This approach inherently answers why a model fails: the attributes steering predictions have realigned. Visualized as evolving line plots of feature contributions, the framework lets operators trace both macro-level drift patterns and micro-level attribute stories simultaneously, rendering abstract alerts obsolete. Haug et al.’s CDLEEDS [48] extends this logic locally, arguing that drift is most interpretable at the cluster level. When SHAP attributions for a subgroup cease summing to predictions—flagged by a t-test—the system declares: “the story of feature importance here has broken.” Flipped SHAP arrows (e.g., from

(x 1)

to

(x 2)

) visually anchor this breakdown, proving that localization and explanation are inseparable. Together, these frameworks demonstrate that explanatory vectors are not just inputs to detectors—they are the detectors, transforming statistical tests into translators of model intent. This evolution toward spatially and contextually enriched narratives continues with Xuan et al.’s BNDM [123], which replaces sliding-window p-values with a cartography of change. By decomposing the data space into a Polya Tree of semantic bins (e.g., “high-speed, low-torque quadrant”), drift manifests as color-flipping histograms where instability is mapped to coordinates. A bin lighting up at coarse resolution (“left of 0.3”) invites zooming to pinpoint the precise band in crisis, while partition names like “voltage range 220–240 V” pre-translate alerts into domain language. Samarajewva et al. [116] then shift the focus to temporal analogy, treating drift as a call to retrieve historical precedents. Residual anomalies trigger searches in an embedding database, returning matches like “Building B’s shutdown (May 2022)” or “HVAC failure (July 2021).” The resulting narrative—“today’s pattern resembles last spring’s LED retrofit”—embeds the why within retrievable context, with metadata-enriched warehouses (“FactDrift tables”) cementing links to events like “forecast error +18% during 38 °C heat.” Here, the statistical residual becomes a query for narrative restitution. Finally, Adams et al. [40] weave causality into the fabric of change-point theory, framing drift as a dialogue between log perspectives. A spike in “average service time” is not an isolated alert but the second act of a story begun weeks earlier, when “clerk workload surged”—a link proven by Granger causality. The resulting paired drift, stamped with lags and feature names (“workload explains service time at p < 0.01”), transforms statistical output into accountable diagnosis. Stored in color-coded diaries (gray for unexplained, blue for causal), these sequences let managers trace systemic levers across incidents, turning isolated detections into chapters of an ongoing operational saga. Across this progression—from agent-annotated thresholds to causal pairs—the class reveals its unifying genius: statistical change-point methods are not detectors but narrators. Parameters like

δ

chronicle environmental noise; Shapley vectors document shifting feature alliances; Polya Trees cartograph instability; embeddings retrieve historical echoes; and causality tests script blame. By binding detection to domain semantics (sensor IDs, feature values, spatial bins, event logs) and prioritizing visuals over metrics, every framework ensures alerts arrive not as cryptic warnings but as storied diagnoses: where change ignited, which elements reconfigured, why it matters, and—in the most advanced cases—how long the fuse burned. This seamless fusion of statistical rigor and narrative intuition distinguishes the class, turning change-point theory into a language of actionable insight. The explanations provided are interpretive aids rather than intrinsic to the predictive mechanism itself, making this class a clear example of explainability over interpretability, fitting it into the XAI cluster.

6.2.2. Datasets

Table A4 lists the datasets used in papers belonging to this methodological group, including those overlapping across multiple categories. If we excluded overlapping papers, the number of datasets would be nearly halved, resulting in a significantly smaller collection. Nevertheless, it is evident that most datasets are employed in single papers. A notable exception is INSECT, used in three papers, which will later emerge as one of the most frequently utilized datasets in this systematic review. Following INSECT, the most frequently recurring datasets are BPI Challenge 2017, Airlines, and Electricity, all of which will reappear in subsequent groups as widely adopted examples. This underscores their status as de facto benchmark datasets for drift detection, given their prevalence in the literature. Not mentioned in the table are the synthetic datasets generated by the authors themselves that they used for their experiments in the papers. These were used using the SEA Generator and Agrawal.

6.2.3. Multidimensional Class Evaluation

From Table A19 it is clearly notable that this class predominantly addresses concept drift in general (nine papers), with some explicit references to sudden, incremental, and recurrent concept drift. Only one paper explicitly focuses on data drift, indicating a stronger emphasis on the conceptual aspects of drift rather than purely statistical anomalies. Regarding application contexts, this group presents significant diversity, including process mining, healthcare, energy, IoT, financial systems, environmental sensors, and retail. Among these, process mining and healthcare appear prominently, suggesting that the structured nature and inherent temporal characteristics of these domains align well with statistical and change-point detection methods. Concerning the scope, in this category there is a relatively balanced distribution between global and local explanations. However, the slight predominance of local explanations reflects the capability of statistical methods to highlight precise features or intervals that trigger drift events. In terms of audience, the primary recipients of these methods are data scientists, indicating that the interpretability provided by statistical tests often requires specialized statistical or analytical competencies for effective use. Interestingly, none of the analyzed papers specify explicit quantitative metrics related to interpretability or trustworthiness. This lack of formal metrics implies that these methods primarily offer interpretability through statistical significance and visual indicators rather than standardized quantitative evaluation. Finally, the analysis of learning modes shows a clear preference for online learning approaches, complemented by several hybrid methods, and fewer offline methods. This suggests that the methods within this group are particularly suited for real-time, dynamic data environments, where immediate drift detection is critical.

6.3. Bayesian and Uncertainty Modeling

Five papers use Bayesian reasoning or formal uncertainty estimates as their primary explanatory lens; three other studies borrow Bayesian ideas as a secondary tool.

6.3.1. Technical and Methodological Insights

The profound explanatory power of Bayesian and uncertainty modeling frameworks lies in their ability to transform abstract statistical drift into tangible narratives of eroding confidence, where uncertainty quantification becomes the common language through which models articulate their changing understanding of the world. These approaches share a fundamental philosophy: rather than treating explanation as a post-detection analysis, they embed interpretability directly into the very fabric of drift awareness. At the heart of this paradigm is the treatment of uncertainty metrics as communicative artifacts that simultaneously signal, quantify, and rationalize distributional shifts. When Zhu et al.’s [125] METER framework observes Dirichlet belief distribution flattening—such as when

α

-vectors evolve from (30, 2) to (6, 5)—this numerical metamorphosis explicitly narrates the model’s loss of evidential certainty about known concepts. Crucially, this uncertainty surge does not merely trigger adaptation; it localizes the impact through hypernetwork-induced layer-specific weight shifts, revealing whether drift primarily affects foundational features or higher abstractions. The resulting shift tables become forensic records, telling engineers precisely where the model needs flexibility to reconcile new realities. This principle of uncertainty-as-narrative permeates the entire class. Cianci et al.’s [58] credit risk system demonstrates how Bayesian odds-scaling (

γ

) translates macroeconomic tremors into business vernacular: a

γ

-plunge from 1.2 to 0.4 explicitly declares “default likelihood dropped to one-third of historical norms.” Here, uncertainty quantification performs triple duty—it rescales predictions, explains the global confidence shift, and anchors hierarchical confidence signaling through star ratings that highlight individual unreliable predictions. The framework’s genius lies in weaving

γ

-rescaling, confidence stars, and SHAP feature attributions into a single causal chain: external shocks alter priors, scores rescale to reflect new evidence, confidence stars redistribute to spotlight uncertainty, and feature bars reveal the drivers. Similarly, Jiang et al.’s [49] energy forecasting transforms variance inflation into operational storytelling. When load-prediction uncertainty doubles, the system does not just detect drift—it documents how scenario simulations forced battery-reserve increases, with dispatch logs explicitly stating “80% of scenarios exceeded baseline.” This explanation-through-action paradigm makes uncertainty the protagonist in an auditable decision-making drama. What distinguishes these approaches is their commitment to domain-aligned storytelling. Li et al.’s [114] environmental monitoring does not announce drift through opaque alarms; it curates “key frames” of anomalous batches and translates distribution shifts into physical narratives using RuleFit clauses like

s a l i n i t y \leq 23.6 \land t e m p e r a t u r e \leq 13 \Rightarrow o x y g e n ↑

. The disappearance of a high-pH rule becomes a chemist’s alert for acidification, while emergent chlorophyll thresholds tell biologists of blooming phytoplankton. Costa et al. [101] achieve similar intuitiveness by encoding uncertainty into evolving cognitive maps: entomologists watch Bayesian belief updates thicken wing-beat frequency branches while thinning harmonic-energy connections—a visual dialect that says “the model now trusts rhythm over timbre” without statistical jargon. Both frameworks demonstrate how uncertainty can be mapped to domain-specific lexicons, whether through disappearing chemical rules or shifting entomology decision paths. Underpinning this explanatory richness is a sophisticated repurposing of Bayesian machinery. Earlier methods like Xuan et al.’s [123] Polya Tree used posterior beliefs primarily for localization, but contemporary frameworks elevate uncertainty metrics to narrative anchors. The

γ

-factor in credit scoring evolves from a rescaling tool to a confidence narrator; Dirichlet distributions transform from detection triggers into evidence-loss ledgers; forecast variances become characters in operational decision stories. This evolution reveals a pivotal insight: when uncertainty quantification is intrinsically linked to adaptation mechanisms—whether through hypernetwork shifts, scenario-driven battery adjustments, or RuleFit clause updates—every adaptive action becomes an explanatory event. The model does not just react to drift; it documents its reasoning through weight modifications, reserve changes, and rule alterations. Consequently, the traditional chasm between detection and explanation dissolves. The flattening Dirichlet belief, the widening forecast variance, the sliding

γ

-factor—these are not mere alerts but the opening lines of a story the model continues to narrate through every adaptive response, creating what might be termed uncertainty’s hermeneutic circle: a continuous interpretive loop where confidence measures both expose drift and rationalize the model’s reconciliation with it. Accordingly, this places this class within the XAI paradigm.

6.3.2. Datasets

The datasets listed in Table A5 reflect the interdisciplinary applicability of Bayesian and uncertainty-aware methods for drift interpretability. Notably, the INSECT dataset emerges as a versatile benchmark, employed across three studies to evaluate drift in dynamic environments, likely due to its temporal granularity and natural concept shifts. Domain-specific datasets like NDBC (marine dissolved oxygen) and City Learn Challenge 2022 (energy management) highlight how Bayesian frameworks excel in uncertainty-rich, real-time domains—typhoon disruptions and grid load variability, respectively— where posterior distributions naturally encode actionable explanations. However, the predominance of structured time-series data (e.g., Bike Sharing, Keystroke) may underrepresent unstructured data challenges, such as image or text drift, limiting insights into modern multimodal contexts. Additionally, industrial datasets like WWTP (wastewater treatment) and BFIP (biochemical processes) underscore the value of Bayesian methods in safety-critical systems, where uncertainty quantification directly informs operational trust.

6.3.3. Multidimensional Class Evaluation

Accordingly, from Table A20, papers of this class explicitly cover various forms of drift, particularly sudden and gradual concept drift, as well as general concept drift and data drift scenarios. This broad coverage demonstrates how Bayesian methods and uncertainty quantification naturally accommodate different drift types by updating prior knowledge based on observed evidence. The application contexts highlighted are predominantly specialized, spanning IoT, marine science, energy, finance, anomaly detection, process mining, video surveillance, and industrial processes. Notably, marine science and energy contexts indicate an affinity for Bayesian methods in environments characterized by significant uncertainty and dynamic conditions, reflecting the strength of probabilistic frameworks in capturing real-world variability. Regarding the scope, there is a slightly stronger emphasis on global explanations compared to local explanations. This suggests Bayesian approaches predominantly provide high-level insights, possibly due to their ability to summarize uncertainty and systemic shifts comprehensively, rather than purely granular, instance-level explanations. The target audience is consistently data scientists across all papers, indicating that the interpretability provided by Bayesian frameworks generally demands a strong background in probability theory and statistical inference. No quantitative metrics for interpretability or compliance are explicitly mentioned. Thus, interpretability is implicitly assumed through posterior distributions, uncertainty intervals, or scenario analyses rather than explicitly quantified metrics. The learning mode is primarily online, reflecting the adaptability of Bayesian methods to continuous data streams. The presence of hybrid and offline learning, albeit limited, demonstrates the flexibility of Bayesian methods across varied data-processing scenarios.

6.4. Causal and Temporal Dependency Tests

This section contains, in total, 3 articles using methods based on Causal and Temporal Dependence Tests. Only one paper takes this as its principal mechanism, while two other employ the same statistical lens secondarly.

6.4.1. Technical and Methodological Insights

The pursuit of explainability in drift binds the Causal and Temporal Dependency Tests class, where Adams et al. [40,94] and Li et al. [114] collectively shift focus from detecting drift to interpreting it through temporally grounded causal chains. At its core, this class contends that meaningful understanding emerges not from identifying when a drift occurred, but from uncovering why it happened by rigorously mapping directional dependencies between events. Adams et al. laid this foundation by integrating Granger-causality testing within a change-point detection framework, explicitly generating “edge tables” that translate statistical drift alerts into interpretable claims about which metric drives which. This transformed isolated drift points into relational explanations—yet remained constrained by its reliance on linear models, leaving potentially critical non-linear interactions unexplored and limiting explanatory depth in complex systems. Building directly on this limitation, Adams et al. elevated explanatory power by introducing kernel-based causality testing, which reveals non-linear dependencies invisible to traditional methods. Crucially, they embedded these insights into an automated visual narrative—the “causal ribbon” matrix—that dynamically links validated cause–effect pairs across time. Color-coded paths (e.g., red to green) transform statistical abstractions (p-values, lags) into an intuitive story: for instance, a workload spike in week 20 non-linearly triggering a collapse in loan-validation service times five weeks later. This visual scaffolding allows domain experts like credit officers to immediately grasp why retraining for a sub-process is justified, bypassing statistical complexity. Furthermore, by leveraging object-centric event logs, the method preserves contextual interactions between entities, ensuring explanations reflect operational reality rather than flattening artifacts—a significant stride toward trustworthy interpretability. Demonstrating the versatility of this causality-first paradigm, Li et al. extended it beyond process mining to environmental drift. Applying kernel causality to exogenous variables, they attributed dissolved-oxygen drift to typhoon-induced wind patterns, with detected one-week lags providing mechanistic explanations for stratification changes. While sharing Adams et al.’s commitment to causal attribution as the bedrock of interpretability, their approach prioritized statistical rigor over integrated visualization, illustrating how the explanatory framework adapts to diverse domains while emphasizing different facets of delivery. Across these works, a unified philosophy emerges: explainability is synonymous with empirically verified causal pathways anchored in time. All methods detect drift points and temporal lags, apply causality tests (Granger or kernel) as explanatory engines, and output actionable attributions (e.g., “Metric A at

t_{1}

→ Metric B at

t_{2}

”). Quantifiable confidence metrics (p-values, Bayes factors) underpin claims, but the frameworks consciously translate them into accessible formats—whether through Adams et al.’s visual ribbons or Li et al.’s lag-based narratives. This evolution reveals a trajectory toward deeper, more actionable interpretability: Adams et al. subsumes their earlier work by unifying non-linear discovery with visual storytelling, while Li et al. validate the paradigm’s adaptability. Nevertheless, the class collectively focuses on metric-level drivers (e.g., workload spikes or wind speeds) rather than concept-level reasoning (e.g., why typhoons alter system behavior), leaving room for future work. Ultimately, their strength lies in rigorously answering “why” through causal chains, directly aligning drift explanations with domain-actionable insights. As such, the explanatory process is diagnostic and applied after observation, fitting this class under Explainable AI.

6.4.2. Datasets

In this section, as is visible from Table A6, there are only three datasets of interest; this is mainly due to the fact that only three papers belong to this class. However, there is one dataset that is used the most. In fact, the 2017 BPI Challenge, a process-mining benchmark, serves as the primary validation ground for two studies, reflecting its usefulness in modeling business workflows with inherent time dependencies. Its reuse highlights its suitability for detecting drift in structured, event-log-driven systems, but also points to a potential over-reliance on a single dataset for generalizing the method.

6.4.3. Multidimensional Class Evaluation

Table A21 shows that this class focuses mainly on general concept drift, alongside occasional references to sudden and gradual concept drift. The emphasis on general concept drift underscores causal methods’ capacity to broadly characterize shifts in dependency structures rather than narrowly targeting specific drift types. Application contexts here are significantly narrower, concentrated mainly around process mining and marine science. This specificity likely reflects the suitability of causal dependency techniques for domains characterized by structured temporal or sequential relationships between events. This group exhibits a uniformly global scope, implying that causal dependency analyses naturally provide system-wide explanations. This aligns well with the inherent goal of causal analyses, which aim to uncover overall changes in relationships or dependencies within the studied processes. The identified audience remains exclusively data scientists, highlighting that interpreting causal dependency shifts and their implications typically necessitates advanced analytical skills and domain expertise. No explicit interpretability metrics are reported, suggesting that explanations emerge primarily from visual or graphical representations of causal structures rather than standardized, quantitative measures of interpretability. The learning mode is predominantly offline, with limited hybrid approaches. This choice likely reflects the methodological complexities and computational requirements associated with identifying causal relationships or dependencies, which tend to favor retrospective, batch-oriented analyses.

6.5. Rule/Logical Pattern Mining

This Section includes 9 out of 82 papers that adopt rule mining, fuzzy sets, or logical pattern extraction as key components of their methodology.

6.5.1. Technical and Methodological Insights

The Rule/Logical Pattern Mining class represents a cohesive paradigm where the interpretability of concept drift is intrinsically woven into the very fabric of the model’s evolution. Across all approaches in this category, drift is not merely detected but explained through dynamic, human-readable logical structures—primarily IF–THEN rules, feature sets, or temporal clauses—that visibly morph in response to changing data distributions. This transforms adaptation into a self-documenting narrative: practitioners observe drift not through abstract error metrics or detached alarms, but through direct, tangible changes in the interpretable components that constitute the model’s reasoning. The unifying thread is that rules and features serve dual purposes: they are both predictive instruments and the primary artifacts through which drift is communicated. When the underlying concept shifts, these elements adapt—spawning new clauses, pruning obsolete conditions, sliding temporal boundaries, or reweighting inputs—and these adaptations are the explanation. Operators thus grasp how, where, and why the concept has changed by inspecting the evolving logic itself, bypassing the need for statistical intermediaries. Common to all papers is the seamless integration of drift detection and explanation. Drift manifests as a visible evolution in the model’s interpretable constructs, ensuring explanation occurs at the moment of adaptation rather than as a post hoc analysis. For instance, in X-Fuzz [106], rising predictive error directly triggers neuro-fuzzy rule updates, and each reshaped IF–THEN hyperplane is immediately exposed to the user, explicitly highlighting which variables survived the change. Simultaneously, LIME-based bar charts—validated for faithfulness via feature-masking correlation—localize the impact of input shifts at the instance level. Similarly, Nesvijevskaia et al.’s [132] fraud detection system makes drift intelligible through a shifting balance of transparent business rules and surrogate “scoring rules”: a rising proportion of alerts lacking rule support directly signals that fraud tactics have outpaced the current vocabulary, while each new scoring rule flags emerging patterns for investigation. This tight coupling ensures that the signal of drift (e.g., rising error) and its explanation (e.g., new clauses, vanishing features) are unified, eliminating interpretive latency. Human-centric presentation further defines this class. Rather than relying on technical scores, explanations prioritize actionable insights grounded in domain semantics. OTARL [110], for example, encodes drift in temporal association rules built from multivariate shapelets. As distributions shift, operators literally see shapelets slide along a timeline or temporal relations flip (e.g., from FEN to NOV), enabling them to articulate shifts like “the second variable now lags the first” without consulting error curves. In Bertini et al.’s [95] Attribute-based Decision Graphs (AbDGs), decaying vertex weights cause attributes to drop in rule-clause rankings or vanish entirely, creating an immediate visual narrative of feature obsolescence. Agarwal et al. [30] distill drift into plain-language sentences (e.g., “Decision switched because Cost = High”) derived from top-k features, while FGRI [134] uses the appearance or disappearance of IF–THEN conjunctions (e.g., pressure threshold clauses) that mirror plant-floor terminology. Visual anchors—color-coded SHAP bars in DMI-LS, tagged N-gram lists in DAEMON [113], or evolving rule timelines in ERulesD2S [99]—replace opaque metrics as the primary interface for operators. Studies report practitioners relying overwhelmingly on these artifacts (e.g., 42% of decisions in X-Fuzz; retraining approvals via SHAP rankings in DMI-LS), underscoring their efficacy in conveying drift narratives. Within this shared foundation, key innovations refine how interpretability is achieved. OTARL uniquely embeds temporal granularity, allowing analysts to read drift not just in which variables change, but in how their timing relationships evolve. X-Fuzz advances explanation fidelity by rigorously verifying LIME’s local interpretations through monotonicity and correlation checks—ensuring explanations reflect actual model behavior rather than approximations. DAEMON introduces platform agnosticism, using raw-byte N-grams tagged with distinguishing family pairs to explain cross-platform drift (e.g., Windows-to-Android malware shifts) as the emergence or disappearance of human-readable code signatures. Later papers like ERulesD2S and FGRI address scalability: by capping rule-set size (e.g., five rules per class) and employing stratified sampling or GPU parallelism, they ensure that explanations remain concise and traceable even under high-frequency drift, preventing the “rule explosion” that plagues less constrained systems. The progression within the class reveals a maturation toward sustainable, granular, and integrated explainability. While early works (e.g., Nesvijevskaia et al.) established hybrid rule-scoring systems to flag novel patterns, later frameworks like OTARL and DMI-LS enhance temporal or feature-hierarchy dynamics for finer-grained insights. DMI-LS further streamlines the workflow by using SHAP-driven active learning to focus annotation efforts precisely where explanatory uncertainty is highest. Collectively, these approaches demonstrate that drift interpretability is most potent when explanation is not an add-on, but the mechanism of adaptation itself. By maintaining a rule-based, feature-centric, or clause-oriented interface—free of latent embeddings or opaque thresholds—these models ensure that every adjustment to the concept is simultaneously a step in its explanation, fulfilling the mandate for intrinsically interpretable drift adaptation. As such, this class reflects the paradigm of white-box modeling aimed at human-understandable representations, situating the class in the Interpretable AI category.

6.5.2. Datasets

The datasets in Table A7 highlight the broad applicability of rule-based methods across diverse domains, emphasizing their strength in extracting human-readable patterns from structured, tabular data. The repeated use of BPI Challenge datasets (2012, 2017) and Traffic Fine Management in process-aware studies underscores their value for temporal rule mining in business workflows, where sequential dependencies are critical. Similarly, cybersecurity benchmarks like the Microsoft Malware Classification Challenge and Drebin demonstrate how rule evolution can track adversarial shifts, such as malware feature drift. Environmental and IoT datasets (INSECT, Weather, Electricity) reveal the method’s adaptability to real-time monitoring, where rules like

[I F w i n d > 15 k n T H E N d e l a y h i g h]

offer immediate operational insights. However, the dominance of structured, domain-specific datasets (e.g., Hospital Billing, Loan Approval Process) may limit insights into unstructured or multimodal data (e.g., text, images), where rule extraction is inherently challenging. The Electricity dataset’s reuse across three papers validates its utility for streaming rule mining.

6.5.3. Multidimensional Class Evaluation

Consisting of nine papers, this group primarily addresses general concept drift, with isolated references to incremental concept drift and data drift, as can be seen from Table A22. This distribution indicates that rule-based methods naturally detect drift as changes in underlying patterns rather than explicitly distinguishing between specific drift subtypes. The application contexts are primarily data mining, process mining, cybersecurity, finance, and aviation. These contexts share structured, interpretable data environments, indicating that rule-based explanations are particularly effective when clear, actionable patterns must be communicated directly to users or stakeholders. Regarding scope, the class is fairly balanced, although a slight preference for global explanations is observable. This highlights rule-based approaches’ effectiveness in capturing broad behavioral shifts through explicit, interpretable rules, but with retained capability for instance-level precision when needed. The primary audience is data scientists across all papers, consistent with the methodological complexity of generating, interpreting, and maintaining rule sets. While rules themselves are inherently interpretable, their generation and validation generally demand specialized analytical knowledge. Notably, this category explicitly specifies interpretability-related metrics in one paper, referring to “faithfulness and monotonicity.” The presence of these metrics, although limited, signals an awareness of the need for quantitative validation of interpretability. The learning mode predominantly features online learning, appropriate for real-time rule updates reflecting changes in streaming data. Offline and hybrid methods appear less frequently, suggesting that rule-based approaches are more naturally suited to dynamic environments where rules evolve incrementally.

6.6. Feature Attribution and Game Theory

Feature Attribution and Game Theory represents the most extensive category. Encompassing a total of 25 articles, with 19 primary papers plus 6 additional works invoking attribution secondarly, this Section covers the fundamental aspects of nearly 25% of all studies.

6.6.1. Technical and Methodological Insights

The unifying strength of this class lies in its foundational reimagining of feature attribution techniques—SHAP, LIME, permutation importance, and their derivatives—not as static diagnostic tools, but as dynamic, embedded narratives that continuously articulate why and how a model’s decision logic evolves in response to drift. Across all 19 papers, attribution shifts—whether manifested in changing importance ranks, fluctuating magnitude profiles, or spatial realignments in attention maps—are elevated into primary, human-interpretable accounts of concept evolution. This approach intrinsically links the justification of individual predictions to the longitudinal story of model adaptation, rendering separate drift detection modules largely redundant. For instance, the TSUNAMI framework developed by Pasquadibisceglie et al. aggregates nightly SHAP vectors into rolling global-importance curves and heatmaps, directly correlating the ascent of older frequency features (F1, F3) with a strategic shift in retail churn logic from spend-based to cadence-based signals [133]. Similarly, Duckworth et al. [105] normalize and temporally stack SHAP vectors to generate clinical dashboards, where climbing curves for “respiration rate” or “ambulance arrival” visually narrate the emergence of COVID-19 respiratory distress patterns weeks before performance metrics degrade. In both cases, the same attribution values that locally explain a prediction simultaneously compose a global timeline of conceptual change, turning explanation into an early-warning system. Methodologically, these papers innovate by ensuring the attribution itself becomes a lightweight, streaming-compatible signal of drift interpretability. Kebir and Tabia [61] distill Shapley values into ultra-light surrogate regressors, enabling resource-constrained edge devices to track attribution fidelity as a real-time barometer of concept integrity. Their finding—that deviations between surrogate and reference SHAP scores spike precisely during known distribution shifts—demonstrates that explanation accuracy is a sensitive proxy for drift. The ISAGE method by Muschalik et al. [50] and the IPFI method by Fumagalli et al. [108] further optimize this real-time capability through incremental, constant-time updates. ISAGE’s exponential smoothing of Shapley-style loss differences, for example, preserves the Shapley conservation law while generating immediate vector snapshots. When applied to electricity demand streams, crossing curves for vicprice (rising) and nswprice (falling) visually articulate a shift from imputed to genuine pricing signals the moment real data arrives—without invoking external detectors. Crucially, many frameworks offer tunable lenses for causal versus correlational narratives. ISAGE’s observational mode, which respects feature correlations, might mask “commission” importance if predictable from “salary,” while its interventional mode—breaking correlations—highlights commission’s role when marginal distributions change. This duality provides domain experts (e.g., auditors) with tailored explanatory perspectives on root causes. The papers also showcase how attribution narratives are refined for domain-specific intelligibility. Farrugia et al. [127] express LIME outputs as plain-language inequality rules (“login-country ≠ KYC-country”), allowing fraud analysts to instantly see which behavioral cues—like geolocation mismatches during COVID-19—dominate the model’s evolving “fraud” definition. Siirtola and Roning [36] derive compact “relevance signatures” from permutation vectors, where a negative sum over specific features (F2, F4, F6) explicitly flags mislabeled walking/biking instances, directly guiding data correction. Gniewkowski et al. [60] advance beyond superficial Bag-of-Words token attribution (e.g., highlighting “/” counts) by employing context-aware RoBERTa embeddings, revealing semantic drift in cyber-threats—such as the decline in SQLi tokens (“DROP TABLE”) and the rise in URL-encoded macros in later datasets. This progression from syntactic to semantic explanation underscores how attribution methods themselves evolve to capture deeper conceptual shifts. Similarly, Choi et al. [100] spatially map drift through in-model attention layers and post hoc Grad-CAM, where a heatmap fixating on scanner tables instead of lung tissue makes data bias viscerally apparent to clinicians, prompting targeted retraining. Notably, several papers position attribution geometry as the core adaptation trigger. The SCAL framework introduced by Clement et al. projects SHAP vectors into an “explanation space,” using cluster dynamics—silhouette scores and noise emergence—to autonomously guide hyperparameter adjustments [59]. A multiplying cluster count during COVID-19 signals diversified consumption behavior, prompting recalibration that improves accuracy. Saadallah’s OEP-TT clusters TreeSHAP [144] lag-importance profiles, making ensemble updates interpretable: replacing a “Lag-15” model with a “Lag-3” representative concretely narrates the shift from long-term seasonality to recent spikes. He and Liu [111] further demonstrate proactivity, using the Jeffrey divergence between successive SHAP-derived ranking-preference lists to trigger retraining before accuracy drops, while actively querying samples whose local attributions diverge from global rankings. This contrasts with the forensic use by Chow et al. of gradient-based attribution to retrospectively dissect why a static malware detector failed—revealing, for instance, that new fraud families leveraged “googletagmanager.com” strings unseen in training data [33]. Collectively, these works illustrate a paradigm shift: feature attribution transcends post hoc analysis to become an embedded, self-updating ledger of concept evolution. By preserving axiomatic properties (e.g., conservation laws) while streaming interpretable outputs—rankings, rules, or spatial maps—the approaches of Duckworth, Pasquadibisceglie, Kebir, Farrugia, and others ensure that the very values justifying a prediction also chronicle the model’s changing rationale. This intrinsic coupling allows stakeholders—clinicians, engineers, fraud analysts—to read drift directly from evolving attribution landscapes, transforming what was once a detection challenge into a continuous, operational narrative of adaptive intelligence. Therefore, the explanatory process is diagnostic and applied after observation, fitting this class into the XAI group.

6.6.2. Datasets

The datasets listed in Table A8 reveal critical insights into the application scope and validation strategies of feature attribution methods for drift detection. Notably, the Electricity and Adult datasets are frequently reused across studies, highlighting their utility as benchmarks for temporal or demographic shifts. These datasets likely offer structured, time-varying features (e.g., hourly electricity demand or income demographics) that align well with contribution-trajectory monitors (e.g., smoothed Shapley values or iSAGE loss decompositions). Similarly, the Bike Sharing and Airlines datasets provide cyclical temporal patterns, enabling researchers to test how attribution shifts reflect seasonal or operational changes. Domain diversity is also evident: cybersecurity studies (e.g., CICIDS2017, IoTID20) leverage attribution to detect abrupt feature importance shifts during attacks (e.g., DNS payload length surges in INSOMNIA), while healthcare datasets like PROMISE12 Prostate MRI and Southampton General Hospital’s ED demonstrate how drift in clinical feature relevance can signal evolving diagnostic practices. However, the predominance of cybersecurity and transactional data (e.g., Brazilian E-Commerce) suggests potential gaps in domains like environmental science or social systems, where drift patterns may differ. A limitation lies in the reliance on static or synthetically partitioned datasets (e.g., CoverType), which may not fully capture real-world temporal dynamics. These datasets underscore the adaptability of game-theoretic attribution frameworks but also call for broader domain inclusion and richer temporal granularity to advance robustness evaluations.

6.6.3. Multidimensional Class Evaluation

Table A23 reveals that this category prominently addresses general concept drift, accompanied by substantial coverage of sudden and gradual concept drift, and a few instances explicitly focusing on data drift. The comprehensive drift-type coverage reflects the inherent flexibility of attribution methods, such as SHAP or LIME, in adapting explanations to diverse drift scenarios. The application contexts are notably diverse, encompassing cybersecurity, energy, healthcare, retail, education, IoT, financial services, and process mining. This broad applicability emphasizes how feature attribution methods naturally provide domain-agnostic explanations that translate effectively across varied data and operational contexts. The analysis reveals a balanced distribution between local and global scope, though local explanations slightly dominate. Feature attribution methods inherently focus on pinpointing specific feature contributions, making them naturally suitable for instance-level interpretability, complemented by global insights through aggregated or averaged attributions over broader data segments. The audience primarily comprises data scientists, with limited explicit mention of end-users. Although attribution methods provide intuitive explanations (feature contributions), effective interpretation and validation usually necessitate advanced domain knowledge and analytical expertise. Interestingly, explicit metrics like feature agreement and fidelity–sparsity are cited within this group, suggesting greater methodological maturity regarding the quantitative evaluation of interpretability compared to other classes analyzed thus far. Finally, this category shows substantial reliance on online and hybrid learning modes, illustrating feature attribution methods’ adaptability to streaming or dynamically evolving data contexts. The presence of offline methods, albeit less frequent, emphasizes the methods’ flexibility in retrospective analysis settings.

6.7. Latent-Representation Geometry and Similarity

This Section groups the 11 papers, out of 82, that belonging to the Latent Representation Geometry and Similarity category. Seven papers make this their primary stance, and four others use the same idea to sharpen explanations produced elsewhere.

6.7.1. Technical and Methodological Insights

The papers in this class collectively demonstrate that the evolution of geometric relationships within learned latent spaces provides an intrinsic, actionable explanation for concept drift, transforming abstract statistical shifts into human-interpretable narratives. Banf and Steinhagen [139] initiate this paradigm by treating classifier embeddings as a “live map” of data reality: their outlier scores, visualized through stark histogram contrasts, immediately implicate specific regions of latent space where novel defects emerge, allowing engineers to inspect high-score images and see the unfamiliar visual patterns (e.g., edge roughness) that triggered the drift alarm. Yu et al. deepen this spatial logic by exposing drift through changes in the fuzzy rule hierarchies governing multi-stream latent alignment. When a rule’s dominance slips (e.g., “IF Temp = 35 °C THEN Alert” fades), the method not only flags drift but explicitly identifies which feature range lost discriminative power, rendering the shift a semantic edit to the rule itself—no error curves required. Yang and Xu [53] extend this to behavioral contexts with their “living scrapbook” memory network: drift manifests as a visible migration of attention weights toward rarely used memory slots, highlighting anomalous attributes (e.g., “Location = Tokyo, Amount > €5000”) that form an instance-level rationale for the shift. Crucially, all three methods anchor explanations in domain-meaningful semantics—visual traits, rule thresholds, transaction attributes—avoiding opaque distance metrics. Wang et al. [65] and Guzy et al. [47] further prove that drift explanations can reside within model internals. Wang exposes degenerating neuron “distinguishability” in malware detection: collapsing bimodal activation distributions (e.g., neuron 23’s eroded bimodality) visually signal which internal decision boundaries have blurred, directly linking performance decay to specific degraded components. Guzy synchronizes generative drift detection with pixel-level relevance heatmaps: practitioners simultaneously see a quality curve dip and the shifting visual traits (e.g., lace → cuffs) that the discriminator now deems decisive, converting latent decay into an intuitive “before-and-after” story of feature relevance. Sun et al. then showcase how engineering latent geometry for domain interpretability elevates explanations: their wavelet-derived “ridge” coefficients embed geological semantics into the representation, so drift is explained through a ranked list of altered stratigraphic cues (e.g., ridges marking depositional cycles), enabling geologists to relabel only depths where new cycles emerge. This contrasts with the dual geometric narrative proposed by Zonoozi et al. for unlabeled streams: they expose drift through trajectories of concept vectors (e.g., increasing centroid velocity for “running”) and shifts in actor participation weights (e.g., “60% walking → 40% running”), rendering behavioral changes as intuitive fractions on a probability simplex [39]. The fil rouge is geometric transparency as an explanation. Whether through histograms pinpointing anomaly clusters, rule edits specifying degraded thresholds, attention maps highlighting anomalous attributes, neuron diagnostics localizing eroded boundaries, or ridge lists flagging geological shifts, all methods leverage the distortion of latent structures to answer “why” drift matters. The differences lie in granularity—instance-level (Yang), feature-level (Sun), or model-internal (Wang)—but all share a core thesis: latent space is not merely a detection arena but an explanatory canvas. By visualizing how embeddings, rules, neurons, or concepts move or lose separation, they convert drift from a statistical alarm into a legible story of change, directly guiding interventions like targeted relabeling or rule updates. This geometric intimacy—where explanations emerge from the data’s own spatial evolution—distinguishes the class, turning latent representations into both sensors and interpreters of drift. Accordingly, this places this class within the Interpretable AI paradigm.

6.7.2. Datasets

The datasets in Table A9 focus heavily on image-based tasks (e.g., CIFAR-10, CelebA, MNIST), which makes sense because methods using latent representations work well with complex, structured data like images. Deep learning models excel at finding meaningful patterns in visual data, making these datasets ideal for detecting drift through changes in hidden layers. However, the inclusion of specialized datasets like Drebin (for detecting Android malware) and NEU Surface Defect (for industrial quality checks) shows that these methods can also handle real-world, non-image problems. Most datasets listed are public and widely used, which helps other researchers replicate results. However, there are very few industry-specific or private datasets (like XSteel Surface Defector Hospital Seizure Corpus). This limits the understanding of how these methods handle real-world issues. Finally, while the seven papers use diverse datasets, this also shows a need for more testing across different fields to create clear guidelines. Including multimodal data or cross-domain benchmarks could better demonstrate how latent-representation methods explain drift in complex settings.

6.7.3. Multidimensional Class Evaluation

As visible from Table A24, this class addresses diverse drift types, including sudden, gradual, incremental, and recurrent concept drift, as well as data drift. The wide range of drift types illustrates the versatility of latent-space methods, which naturally handle multiple drift scenarios by representing data shifts in embedding spaces rather than in raw features. The application contexts are particularly specialized, covering financial analysis, geology, image generation, process mining, healthcare, weather prediction, manufacturing, energy, and environmental monitoring. The prominent inclusion of contexts such as image generation, manufacturing, and healthcare reflects the inherent suitability of latent-space methods in complex, high-dimensional, and structured data domains. Regarding the scope, both global and local interpretations are well represented, with a slight inclination toward global explanations. This pattern reflects the inherent capability of latent representations to capture system-wide shifts through visualizations and global metrics while retaining the potential for localized interpretability through instance-specific analyses. The intended audience for these methods is primarily data scientists, as latent-space analysis typically involves advanced visualizations (e.g., UMAP, t-SNE) and mathematical interpretations (e.g., embedding distances, cosine similarity), demanding sophisticated analytical expertise. No explicit interpretability-related metrics are mentioned within this class. This indicates that interpretability primarily arises from visual explorations and embedding-space trajectories rather than standardized quantitative measures. The learning mode appears balanced among offline, hybrid, and online methodologies, highlighting the flexibility of latent-space techniques for both retrospective analyses (batch processing) and real-time monitoring.

6.8. Fuzzy Transparency

Out of 82 papers, 5 place fuzzy or neuro-fuzzy reasoning at the heart of explanation, while 2 others employ fuzzy layers to enhance latent representations or hybrid learners.

6.8.1. Technical and Methodological Insights

The profound strength of the Fuzzy Transparency class resides in its foundational philosophy: explainability is not merely an output but the very fabric of adaptation itself. Across all five studies—Goyal et al. [129], Wen et al. [120], Das et al. [102], Zheng et al. [136], and Ducange et al. [126]—the dynamic evolution of linguistically grounded rules constitutes the explanation for drift. When concepts shift in the data stream, these systems eschew opaque statistical alarms in favor of translating change into human-interpretable narratives through immediate, visible modifications to their fuzzy rule structures. Goyal’s neuro-fuzzy hybrid embodies this principle by embedding transparency into its core mechanics. Rather than simply flagging divergence through a metric, it exposes precisely how drift redefines feature interpretations—such as the membership boundary for “high SrcBytes” expanding from 80 KB to 120 KB as new fraud patterns emerge. This direct visibility of rule adjustments empowers operators to bypass accuracy metrics entirely, using the updated IF–THEN clauses as a diagnostic ledger to justify decisions like retraining. Building on this, a shared strategy amplifies clarity through deliberate sparsity. Wen’s OSSR-NNRW exemplifies this approach. Its online recalibration of sparse weights transforms coefficient shifts into a self-documenting drift log: variables losing relevance abruptly vanish from the non-zero weight chart—like “oxygen-enrichment flow” disappearing in blast-furnace monitoring—while newly critical features emerge with fresh coefficients. This sparse tableau functions as an intuitive visual explanation, where the zeroing of weights explicitly signals deprecated relationships without demanding statistical literacy. Similarly, Das’s leRSPOP employs rough-set reduction to ruthlessly prune low-support rules after each data batch, distilling the active rule list into a high-impact change report. The disappearance of a clause like “speed < 45 km/h” or the sudden appearance of “volume > 1200 veh/h” directly narrates shifting data dynamics, sparing operators the need to decipher complex metrics. Linguistic anchoring further bridges technical adaptation and domain intuition, turning mathematical shifts into natural language. Zheng’s EEG recognition system maps changes in Gaussian parameters (

c_{k}

,

δ_{k}

) to clinically meaningful statements—e.g., neurologists observe that “the High band-energy range has migrated from 80

μ

V^{2}

to 120

μ

V^{2}

”. Ducange’s Fuzzy Hoeffding Tree extends this transparency, framing splits and withering branches as evolving stories: a new branch declaring “Humidity Ratio: Medium → High now best separates occupancy” or a stale “High-Humidity” path vanishing communicates drift through plain-language rules, rendering thresholds interpretable without manual decoding. This consistent mapping to terms like Low, Medium, and High ensures that explanations resonate with domain practitioners. Notably, the papers refine and extend each other’s approaches to enhance explanatory depth. Das introduces a knowledge-preserving ensemble layer where frozen pre-drift models safeguard historical context: if a discarded attribute regains relevance, original rules resurface via voting, preventing “explanatory amnesia”—a significant advance over Wen’s approach, where deprecated features leave no trace. Wen, however, innovates by merging drift explanation with data diagnostics: outlier clusters visibly shrink affected coefficients, exposing data-quality issues intrinsically. Ducange, meanwhile, temporalizes drift interpretation through rule lifecycles (appearing, evolving, vanishing branches), and Goyal explicitly links ensemble weight redistribution to rule updates, revealing not just what changed but why the model’s confidence shifted. The unifying theme across this class is the treatment of fuzzy rule structures as living documentation. Detection, rationale, and adaptation converge into a single transparent artifact—whether a clause, coefficient, or branch—where change is intrinsically self-explanatory. Critically, empirical validation underscores that practitioners consistently prioritized these visible adaptations over performance metrics. Neurologists using Zheng’s system consulted shifting rule parameters—not accuracy curves—to request relabeling; operators with Wen’s model tracked coefficient rearrangements to schedule retraining. This convergence of evidence confirms that fuzzy transparency, by weaving explanation into the adaptive process itself, transforms drift from an opaque trigger into an auditable, domain-grounded narrative. Collectively, these methodologies demonstrate how explainability can be engineered as the heartbeat of learning systems, directly fulfilling the journal’s imperative for deep, integrated interpretability. As such, this class belongs to the Interpretable AI category.

6.8.2. Datasets

The datasets employed by the five papers, plus two overlapping papers, listed in Table A10, span diverse application contexts, including smart buildings, healthcare, highway traffic prediction, industrial processes, and cybersecurity. This variety highlights the adaptability of fuzzy methods across domains with distinct data characteristics (e.g., time-series sensor data, structured medical records, or high-dimensional cybersecurity logs). However, the lack of overlap in application domains may limit the direct comparison of methodological performance, as evaluations are inherently tied to context-specific challenges and metrics.

6.8.3. Multidimensional Class Evaluation

Based on Table A25, these papers focus predominantly on general concept drift, with additional emphasis on incremental and sudden concept drift. The concentration on general and incremental drift emphasizes how fuzzy logic naturally accommodates gradual shifts through linguistic rule adaptation and membership function updates. The application contexts identified are mainly IoT, healthcare, industrial processes, and general data mining. The prevalence of IoT and healthcare applications suggests fuzzy methods’ strength in fields where human-readable rules and flexible, linguistic interpretability are crucial for operational understanding and decision-making. This class exhibits a balanced approach to the interpretability scope, with local explanations appearing slightly more frequently. This likely results from fuzzy systems’ inherent linguistic interpretability, naturally suited to instance-level explanations that stakeholders can immediately understand in domain-specific terms. The audience consistently comprises data scientists, reflecting that, although fuzzy logic generates intuitive and understandable linguistic rules, the development, tuning, and validation of these rules typically demand expert analytical knowledge. Explicit interpretability metrics are not reported within this group. Interpretability implicitly arises from linguistic transparency and rule-based explanations, without standardized quantitative evaluations. Regarding the learning mode, online and hybrid approaches dominate, demonstrating fuzzy logic’s natural adaptability to real-time environments. This aligns well with scenarios requiring continuous rule refinement and membership function updates as data streams evolve.

6.9. Prototype/Medoid and Exemplar Tracking

Out of 82 papers, 5 rely on nearest-case reasoning or prototype layers.

6.9.1. Technical and Methodological Insights

Here is a seamlessly integrated narrative that weaves together the explanatory approaches of all papers into a continuous analytical discussion: the Prototype/Medoid and Exemplar Tracking approaches collectively transform concept drift from an abstract statistical phenomenon into a visually interpretable narrative by anchoring explanation to the evolutionary journey of representative data points. This foundational principle manifests differently across studies while maintaining a coherent explanatory philosophy. Saralajew and Villmann [64] establish the paradigm by demonstrating how prototype movement in input space intrinsically encodes the drift explanation—tangent arrows quantifiably point to shifting feature importance when benign packets grow larger and less regular, making displacement vectors themselves attribution maps. The SECLEDS approach introduced by Nadeem and Verwer evolves this geometric storytelling, making drift democratically visible through medoid popularity contests [51]. Rather than sliding points, SECLEDS tracks vote-share histograms and exemplar turnover, allowing operators to compare the concrete packet trace of an evicted “yesterday’s behavior” against the newly promoted pattern now deemed central—a narrative of representation succession where three consecutive medoid swaps tell the story of changing data regimes. Building on this visual accessibility, the Interactive-COSMO method introduced by Calikus et al. distills drift interpretation into an even more immediate spatial language [41]. By coupling a migrating prototype with a “breathing” decision radius and highlighted boundary crossings, the system creates a real-time cartography of normality: when a pump’s prototype slides toward higher loads while the radius simultaneously widens, engineers immediately grasp both the new operational center and growing uncertainty without statistical abstractions. This spatial coherence is refined further by the ICICLE approach proposed by Rymarczyk, which tackles the challenge of preserving narrative continuity during incremental drift [35]. Where earlier methods might show disjointed shifts, ICICLE’s similarity-distillation regularizer ensures prototypes like “bird beaks” migrate cohesively across tasks—new beak concepts anchor near old ones rather than appearing randomly in latent space—while logit-bias compensation keeps prior prototypes active. Clinicians thus witness controlled visual evolution rather than disruptive rewiring, maintaining interpretability itself against drift. Sun et al. complete this explanatory arc by transposing geometric movements into domain-specific analogies. Financial drift materializes through dynamically weighted neighbor lists, where retail banks yielding to logistics startups in precedent rankings explicitly narrate shifting risk landscapes. Crucially, all methods share a core explanatory mechanism: whether through Saralajew’s sliding prototypes, SECLEDS’ medoid elections, COSMO’s migrating centroids, ICICLE’s anchored features, or Sun’s rotating precedents, the evolving representative is the explanation. Later systems enhance this by ensuring narrative continuity—ICICLE’s task-to-task tethering and Sun’s fading weights prevent jarring context breaks that might obscure drift causality. Collectively, they prove that representative tracking bypasses technical metrics entirely, translating drift into operator-actionable stories where motion, succession, or recomposition directly reveal both what changed and why. This evolutionary thread—from individual point movement to contextualized representative lifecycles—forms the class’s distinctive contribution: making drift interpretable through embodied data narratives. Therefore, this class is aligned with XAI.

6.9.2. Datasets

The analysis of datasets in the Prototype/Medoid and Exemplar Tracking class reveals a tension between methodological innovation and practical reproducibility. Among the five papers, only one explicitly references a public dataset, as listed in Table A11. Sun et al. [146] rely on confidential data, limiting external validation and raising concerns about siloed, domain-specific insights. Similarly, Calikus et al. [41] and Saralajew et al. [64] cite datasets that could not be traced, undermining transparency and replicability. Nadeem et al. [51] address accessibility barriers through synthetic data, but such datasets risk oversimplifying real-world complexities like noise or contextual drift. The lack of shared test datasets creates major problems in concept drift research. When data is private or poorly explained, researchers cannot properly compare different methods. While made-up data offers flexibility, it does not reflect real-world conditions. To improve the field, researchers should share anonymized datasets or tools to create realistic test data, follow open data-sharing standards, and work together on creating benchmark datasets for specific fields. Without these steps, prototype-based methods, despite their potential, will be held back by disconnected data practices that make it hard to prove their real value.

6.9.3. Multidimensional Class Evaluation

As visible from Table A26, this class predominantly addresses gradual and incremental concept drift, with a single occurrence of recurrent drift. This selection highlights the suitability of exemplar-based methods in detecting and visually explaining gradual transitions through explicit shifts or replacements of representative cases. The application contexts identified span financial analysis, image classification, IoT, network traffic, and healthcare, underscoring the versatility of prototype-based techniques across structured and visual data domains. Financial and IoT contexts particularly illustrate how exemplar tracking facilitates clear and concrete interpretability when domains exhibit incremental or subtle changes. In terms of scope, explanations provided are consistently global, indicating that these approaches inherently offer overarching insights through exemplar changes rather than fine-grained instance-level interpretability. This global perspective naturally emerges from visually comparing exemplars as they evolve or shift over time. The intended audience includes both end-users and data scientists, with a slight preference for data scientists. The direct visual representation of drift through exemplar comparisons also makes these methods accessible to domain experts, simplifying communication of shifts. Explicit interpretability-related metrics appear minimally, with only Intersection over Union (IoU) mentioned. This limited quantification suggests that interpretability arises mainly through visual inspection and direct exemplar comparison rather than standardized metrics. Regarding the learning mode, the presence of hybrid, offline, and online approaches highlights the adaptability of exemplar methods across various operational and analytical scenarios, accommodating both real-time monitoring and retrospective analysis.

6.10. Graph Attention and Structural Reasoning

This group comprises 4 primary papers out of 82.

6.10.1. Technical and Methodological Insights

The Graph Attention and Structural Reasoning class pioneers a paradigm where explainability is not merely appended to drift detection but structurally embedded within the interpretation itself. At its core, this approach leverages spatial and relational representations—be it household movement graphs, video grids, or IoT interaction networks—to transform abstract distributional shifts into tangible, human-actionable narratives. Bijlani et al. [97] establish this foundation by treating daily sensor data as self-explanatory graphs, where attention weights dynamically spotlight behaviorally significant locations like bathroom–bedroom transitions before clinical alerts manifest. Crucially, their system personalizes drift thresholds to individual baselines, delivering alerts as ranked room deviations rather than statistical anomalies—a design validated when field nurses prioritized these spatial lists over raw divergence metrics. This spatial grounding is further amplified by the GridHTML framework introduced by Monakhov et al., which replaces numerical anomaly scores with an intuitive visual language: video frames are tessellated into cells that “redden” precisely where motion or appearance deviates from learned norms [131]. Like Bijlani’s attention maps, this heatmap localizes drift spatially—flashing red in a traffic lane while parking bays stay green—but achieves even greater immediacy by eliminating interpretation barriers entirely. Operators naturally decoded drift through accumulating redness in unstable patches, confirming that pixel-level attribution transforms drift from a metric into a visceral, visual story. Building on this spatial explicitness, Guo et al. [46] confront the temporal vulnerability of explanations themselves. Their RoCourseNet framework anticipates how drift might invalidate counterfactual recommendations (“recourses”) by simulating worst-case data shifts during training. Here, the adversarial retraining loop engineers recourses to remain valid under both current and hypothetical future models, quantifying their resilience through a robust-validity score. This metric does not just detect drift; it certifies the explanation’s shelf life—turning recourse feasibility into a human-readable guarantee that drift has been neutralized. When validity scores diverge, analysts instantly see drift outpacing adaptability; when robust validity holds, users trust that “how-to-improve” actions will endure. This temporal safeguarding represents a conceptual leap beyond spatial localization, treating explanation robustness as integral to interpretability. Finally, Wang et al. [38] synthesize spatial and causal reasoning for federated IoT environments. Modeling smart-home rules as interaction graphs, their system identifies drift through SHAP-based sub-graphs that expose malfunctioning automation chains—like “smoke → water-valve → kitchen leak”—with spiked edge weights. By highlighting which trigger–action pathway triggered retraining, FexloT answers the “why” missing in purely spatial approaches. Analysts consistently ignored statistical alarms to act solely on these sub-graphs, proving that causal attribution—not detection thresholds—delivers actionable insights. This evolution—from Bijlani’s “where” (deviating rooms) to Wang’s “why” (faulty automation chains)—demonstrates how structural reasoning matures across the class. Collectively, these papers share a rejection of opaque metrics in favor of intrinsically interpretable artifacts: attention-ranked rooms, self-annotating heatmaps, drift-proof recourses, and causal sub-graphs. Whether through visual immediacy, adversarial hardening, or causal transparency, they ensure that every drift alert arrives with its explanation structurally interwoven—not as an afterthought but as the very fabric of detection. As such, symbolic approaches exemplify white-box modeling and offer direct interpretability by design, labeling this class as Interpretable AI.

6.10.2. Datasets

While Table A12 lists three public datasets, VIRAT (video surveillance), Loan Approval Process (financial workflows), and South German Credit (credit risk), the absence of datasets from two papers underscores a critical tension in this field. Paper [38] use a synthetic dataset which exemplifies a common workaround for domain-specific data scarcity. On the other hand, Paper [97] relies on private “Minder” dataset that highlights a transparency problem: without access to this data, other researchers cannot verify claims about healthcare activity patterns or build on this work.

6.10.3. Multidimensional Class Evaluation

According to Table A27, this class comprises four papers with an equal distribution between general concept drift and data drift, underscoring graph-attention methods’ balanced applicability to changes in model behavior and underlying data distributions. The identified application contexts are varied, including IoT, video surveillance, general analytical contexts, and healthcare. These areas benefit from structural reasoning due to their naturally interconnected or spatially structured data, making visual and structural explanations particularly intuitive and effective. Notably, this class equally addresses both local and global interpretability scopes, reflecting the dual capability of graph-based methods in providing comprehensive system-level insights and pinpointing specific nodes or edges responsible for drift. The audience encompasses both data scientists and end-users, illustrating the natural interpretability afforded by graph visualizations, which can directly communicate shifts to both technical experts and operational stakeholders. Explicit metrics such as fidelity–sparsity and robust validity are present, indicating a conscious effort within the group to quantitatively validate interpretability and robustness, enhancing credibility in operational decision-making contexts. The learning mode demonstrates substantial variability (hybrid, offline, and online), suggesting graph-based methods flexibly handle real-time drift detection and retrospective structural analysis, aligning well with the complexity and dynamics inherent in graph-structured data environments.

6.11. Optimization and Resource Scheduling

The following Section describes the only paper belonging to this class.

6.11.1. Technical and Methodological Insights

The approach to drift explainability proposed by Cai et al., is fundamentally integrated into the operational logic of their ORRIC algorithm, creating a self-revealing system where every technical decision inherently communicates its reasoning about drift [32].The algorithm’s core output—a precisely specified retraining and inference configuration pair like “30% sampling, 28 × 28 inference”—functions as a primary explanation mechanism. This selection represents the exact boundary of computational feasibility within the edge device’s constraints, immediately revealing resource allocation priorities. By designating this “last viable pair” before budget overrun, the system transparently quantifies how much capacity is being diverted to preempt future drift through retraining versus servicing current inference demands. More subtly, this boundary choice serves as an implicit severity indicator: configurations approaching resource limits signal that the system perceives drift as sufficiently threatening to warrant near-total resource dedication to mitigation. The system further enhances interpretability by distilling complex continuous decisions into four discrete operational regimes: Knowledge-Distillation, Inference-Greedy, Focus-Shift, and Inference-Only. This categorization provides immediate heuristic transparency, allowing operators to instantly recognize which high-level strategy governs current operations. When the system declares “Focus-Shift” mode, for example, it directly communicates that imminent drift has triggered significant resource reallocation away from immediate inference tasks. Crucially, these regime labels also serve as predictive justifications that anticipate and explain forthcoming performance changes. An “Inference-Greedy” designation not only describes the current resource focus but inherently warns operators that accuracy will likely degrade because proactive retraining is being sacrificed for immediate throughput, directly linking the operational state to the expected consequences. Complementing these qualitative explanations, the formal competitive-ratio bound—expressed as

(1 + α) f (0) / f (A_{t}^{m a x})

—provides a quantitative foundation for trust. This scalar transforms abstract theoretical guarantees into concrete, interpretable safety margins. It explicitly quantifies how much more efficiently the chosen balance between retraining and inference handles persistent drift compared to a naive Inference-Only baseline. By framing performance relative to this worst-case scenario, operators receive an actionable confidence measure grounded in tangible resource efficiency gains rather than opaque drift metrics. Together, these three interconnected mechanisms—the boundary-revealing configuration output, the causally predictive regime labeling, and the comparative trust quantification—create a coherent explanatory framework where the system’s response to drift becomes legible through its operational outputs. The architecture ensures that explanations emerge organically from the algorithm’s normal functioning, continuously translating technical resource decisions into operator-understandable narratives about drift mitigation priorities, anticipated impacts, and system confidence. This class clearly aligns with Interpretable AI.

6.11.2. Datasets

The exclusive use of CIFAR-10 as a dataset reflects the primary challenge facing this class: there is only one paper currently representing resource scheduling for drift adaptation. This fundamental scarcity of research directly impacts dataset diversity. While CIFAR-10 provides a standardized testing ground for their ORRIC framework, its static image collection does not reflect the changing, mixed data streams typical of edge computing and IoT systems, such as sensor readings that change over time. This mismatch raises questions about real-world applicability, as actual resource limitations like unreliable network connections and different device capabilities are not captured in the testing environment. The absence of both additional papers and specialized datasets, or even artificial data that simulates drift under resource constraints, makes it difficult to test how well this approach works across different situations.

6.11.3. Multidimensional Class Evaluation

Table A28 underlines how explicitly data drift is addressed in this category. This unique approach directly integrates drift detection into resource optimization processes, emphasizing the operational context of model performance under resource constraints rather than traditional data-monitoring frameworks. The singular application context identified is video surveillance, indicating an operational scenario where immediate resource allocation decisions critically impact performance, and drift directly influences system scheduling decisions. The scope is global, reflecting the holistic perspective of optimization-based explanations, which rationalize resource allocation decisions based on comprehensive performance metrics rather than detailed instance-level analyses. The intended audience is exclusively data scientists, underscoring the complexity and specialized nature of resource optimization decisions within drifting environments. The interpretation of drift is tightly integrated with advanced analytical reasoning about performance metrics and resource constraints. Interestingly, no explicit interpretability-related metrics are reported, highlighting that interpretability emerges from the practical clarity and transparency of optimization decisions rather than standardized quantitative interpretability measures. The learning mode employed is exclusively online, appropriately matching the dynamic and immediate decision-making requirements of optimization frameworks within drift-affected resource-constrained environments.

6.12. Performance Baseline and Metric Audit

This Section covers the 4 papers, out of 82, that belong to this group.

6.12.1. Technical and Methodological Insights

This class pioneers a paradigm where the very metrics tracking model performance are designed to be the explanation of drift, transforming cold diagnostics into a continuous, human-interpretable narrative of change. Duckworth et al. [105] lay the foundation by demonstrating how the temporal dynamics of normalized SHAP values—aggregated weekly into a public dashboard—directly reveal why predictions evolve. When features like respiration rate, triage discriminator, and ambulance arrival spiked simultaneously in March 2020, clinicians immediately understood that the model’s clinical focus had shifted toward respiratory distress, long before AUROC significantly degraded. Crucially, this SHAP timeline does not just detect anomalies; it explains their nature by distinguishing data drift (e.g., surging admission rates) from concept drift (e.g., vital signs eclipsing chief complaints) through the interplay of feature contributions and performance trends, preemptively signaling retraining needs. Building upon this principle of feature-centric explanation, Ouattli et al. [29] extend the temporal lens to quantify the stability of the model’s reasoning itself. By tracking Spearman correlations in bootstrapped, clustered feature rankings across sliding windows, they expose insidious shifts in the prediction rationale—such as 40% of log-message features reversing their impact direction—even when the AUC remains stable. This correlation metric directly explains why past interpretations become obsolete and operationalizes “explanation freshness” by treating window length as a hyperparameter to balance recency against reliability, thereby framing drift as an erosion of trust in the model’s logic. Complementing this feature-level focus, Cánovas-Segura et al. [57] reframe the drift explanation through the evolving relationship between model accuracy and complexity. Their dashboard plots the monthly AUC against the mean number of active predictors in inherently interpretable models (e.g., LASSO regression, rule sets). The countermovement of these metrics—AUC softening as predictor counts plummet when historical data re-enters the sliding window—tells the story of drift as a deteriorating accuracy–complexity trade-off. Clinicians intuitively grasp that rising complexity may temporarily sustain performance until outdated patterns conflict with current realities; the tipping point where this balance collapses is the onset of concept drift, explained without auxiliary statistics. This dual-curve approach contextualizes how covariates lose relevance, how feature filtering preserves usability, and when retraining becomes imperative. This progression culminates in Roider et al.’s paper [115], who anchors the drift explanation to a universally comprehensible baseline: the error of a constant median predictor. Their normalized MAE (nMAE)—the ratio of the model’s MAE to this baseline’s MAE—transforms a scalar metric into an immediate interpretation of drift impact. Values near 1 declare the model useless (“guessing the median”); values above 1 indicate harmful drift; values below 1 confirm residual utility. Paired with a scatter plot of training-vs.-test MAE, deviations from the diagonal visually classify drift as benevolent or hostile, explaining the degradation severity and type (e.g., the sepsis log’s pervasive feature futility vs. the helpdesk log’s robustness) far earlier than accuracy thresholds. Together, these papers form a coherent intellectual stream: they reject post hoc interpretability by embedding explanation into the audit process itself. Duckworth and Ouattli expose the internal logic shifts (through SHAP and ranking instability), while Cánovas-Segura and Rolder elucidate the external consequences (via complexity trade-offs and baseline comparisons). All converge on using temporal metric evolution—whether SHAP averages, ranking correlations, AUC predictor curves, or nMAE trends—as a visual, intuitive language to narrate drift. This language inherently anticipates decay (explaining why before performance crashes), democratizes insight (replacing statistical tests with contextualized metrics), and prescribes action (retraining, window resizing, feature review). The class thus redefines performance monitoring: metrics are no longer mere alerts but the continuous story of a model’s struggle to stay grounded in a changing world, making the “why” of drift inseparable from the “what” of its measurement. Consequently, this class reflects the post hoc nature of XAI, adding it to the Explainable AI group.

6.12.2. Datasets

The datasets in Table A14 reveal a concerning pattern in this research group. While these papers make valuable contributions by connecting performance drift to understandable explanations, their use of public datasets is inconsistent. Only two of the four studies clearly reference accessible datasets, while the other two do not provide clear data sources. This lack of transparency creates several problems. For example, Canovas-Segura et al. [57] analyze changes in clinical data related to a new antibiotic flag, but without access to their dataset, other researchers cannot verify whether their selected features truly explain the observed drift. Similarly, Ouattiti et al. [29] show how data leakage affects accuracy by comparing different train/test splits, but without shared data, it is hard to know whether their solutions work beyond their specific case. These gaps risk reducing potentially useful research to theoretical ideas rather than practical tools. The publicly available datasets that are used, like BPIC process logs or MIMIC-II clinical data, focus heavily on healthcare and business operations. While these areas are important due to their strict performance requirements, this narrow focus raises questions about whether these methods would work equally well in other fields, like finance or social media, where data behaves differently. The limited variety of datasets may hide potential weaknesses in the methods.

6.12.3. Multidimensional Class Evaluation

According to Table A29, this class primarily addresses general concept drift, with one instance explicitly targeting data drift. The focus on general drift highlights how metric-based approaches inherently detect broad shifts in model performance, using performance baselines rather than specific drift types. The identified application contexts predominantly involve healthcare, business process management, and software engineering, reflecting contexts in which performance stability and compliance with predefined standards are critical. Healthcare and business processes particularly benefit from explicit metric auditing to maintain operational accuracy and reliability. Explanations within this class are consistently global, aligning naturally with the concept of performance baselines that encapsulate overall model stability rather than granular instance-level shifts. The audience is primarily data scientists, with partial involvement of end-users, emphasizing that interpreting metric deviations usually requires analytical expertise, though clear visual aids and dashboards can also facilitate user-level comprehension. No quantitative interpretability metrics are explicitly identified. All approaches utilize offline learning modes, demonstrating a clear inclination toward retrospective analysis, typical of performance audits and baseline assessments, which naturally rely on historical data for accurate detection and diagnosis.

6.13. Bias/Fairness Through Time

In this Section we present the only 2 studies, out of 82, that address drift from an explicitly fairness-centric angle.

6.13.1. Technical and Methodological Insights

The “Bias/Fairness Through Time” class pioneers a paradigm where explainability artifacts transcend static interpretation to become dynamic diagnostic tools, continuously narrating how fairness erodes or adapts under drift. This approach crystallizes in complementary methodologies that progressively deepen our interpretive lens. At its foundation, Castelnovo et al. [42] establish that meaningful drift interpretation requires contrasting model behaviors across time. Their dual-framework transforms conventional explainability outputs into temporal sensors: the

Δ

Shapley timeline directly harnesses the annual recomputation of Shapley values—an inherent byproduct of retraining—to quantify changing reliance on sensitive attributes. When contributions to predictions for protected groups (e.g., citizenship) intensify while group fairness metrics remain stable, this trajectory instantly reveals a critical trade-off: the model sacrifices individual equity to maintain demographic parity. Simultaneously, their FairX framework extends this contrastive logic to the rule-based structure of fairness-constrained models. By converting successive models into comparable surrogates (binary decision diagrams) and highlighting emergent, vanished, or modified paths involving sensitive features, FairX exposes how logical conditions for equitable outcomes morph over time. For instance, a new rule demanding “non-citizen ∧ income > €30k” signals an interpretable “fairness regression,” directly linking drift to discriminatory logic evolution. Crucially, these techniques intertwine: the

Δ

Shapley magnitude flags when bias worsens, while FairX rules explain how it manifests in decision pathways, together forming a cohesive story of fairness drift. Building upon this foundation of model-internal contrast, Suffian and Bogliolo [145] shift the interpretive focus to the operational interaction stream between users and models, demonstrating how explainability itself can be weaponized as a real-time drift audit. They ingeniously repurpose counterfactual explanations (CFEs)—typically, static suggestions for recourse—into a living diagnostic system. Each user interaction with a CFE generates a hypothetical data point, but before these points seed retraining, a fairness-conscious screening interprets potential drift: ADWIN windows detect statistical shifts while balanced ensembles predict new labels, rejecting points that risk amplifying bias. This transforms the reliability of CFEs into a proxy for drift control. Users still receive familiar suggestions (“increase income by €5k”), but the underlying screening process creates an interpretive layer: spikes in the dashboard metric “drifted points rejected” signal when CFEs become untrustworthy due to underlying drift or imbalance, while the “class-balance ratio after acceptance” reveals whether corrective actions reinforce equity. Thus, the very sequence of counterfactual interactions evolves into a real-time narrative, exposing whether fairness is sustainably maintained as data shifts. Suffian and Bogliolo’s work operationalizes Castelnovo’s core principle—using explainability artifacts as drift sensors—but extends it into the dynamic context of user feedback, adding metrics that translate abstract drift interpretation into actionable operational intelligence. The fil rouge unifying these works is the transformation of explainability from a post hoc illustration into the primary mechanism for diagnosing fairness drift. Both frameworks reject the separation of “detect then explain”; instead, interpretable outputs are the drift monitoring system. Castelnovo et al. achieve this through longitudinal contrasts of model internals (feature attributions and rule structures), making drift interpretable as evolving trade-offs and logical conditions. Suffian and Bogliolo, recognizing the limitations of periodic model snapshots, extend this into the continuous flow of deployment by leveraging user interactions. Their CFE screening process creates a closed-loop system where the explanations offered to users simultaneously audit the system’s susceptibility to bias reinforcement. This progression—from contrasting discrete model versions to monitoring an interactive explanation stream—represents a maturation of the approach, embedding explainability ever deeper into the lifecycle of fairness maintenance. Together, they demonstrate that sustaining equity in nonstationary environments demands not just detecting drift, but continuously interpreting its narrative through the evolving language of the model’s own explanations, whether written in Shapley deltas, decision paths, or the screened counterfactuals guiding user actions. Accordingly, this renders this class within the XAI-based paradigm.

6.13.2. Datasets

The datasets used in these studies show a common problem: public data makes it easier for others to check the work, while private data better reflects real-world situations. Well-known public datasets (German Credit, COMPAS) are visible in Table A22. These let researchers test and compare methods, but they might miss details about how bias changes over time in real life. Suffian and Bogliolo use private data, which better show real-world bias changes but cannot be shared, making it hard to verify or reuse their methods [145]. This split makes it tough to compare the two studies. Public datasets are simpler to work with but might not capture real-life complexity, while private data hides how well the methods actually work.

6.13.3. Multidimensional Class Evaluation

As shown in Table A30, this small class comprises two papers equally focused on general concept drift and data drift, explicitly addressing drift in fairness and bias metrics. This explicit focus underscores the growing recognition that fairness and ethical compliance require continuous monitoring over time, similar to traditional performance metrics. The singular application context identified is financial services, highlighting the critical nature of fairness tracking in high-stakes decision-making environments where model biases can have significant ethical and legal implications. Explanations are evenly balanced between local and global scopes, emphasizing the necessity of both detailed instance-level bias tracking and broader systemic assessments to ensure comprehensive fairness. The audience includes both data scientists and end-users, demonstrating a deliberate aim to transparently communicate fairness implications directly to stakeholders who may be affected by biased decisions. No explicit interpretability-related metrics are mentioned, suggesting that interpretability primarily arises directly from the fairness metrics themselves, such as SHAP-based bias scores, rather than from additional standardized measures. Hybrid learning modes dominate this group, highlighting the combination of immediate online monitoring for rapid bias detection with periodic offline analyses, effectively balancing responsiveness and thoroughness in bias mitigation.

6.14. Human-in-the-Loop and Pedagogical XAI

In this class we find 3 of the 82 papers selected for our SLR.

6.14.1. Technical and Methodological Insights

The papers in this class fundamentally reposition explainable AI (XAI) artifacts as the primary medium for interpreting and responding to concept drift, rather than treating explanations as mere supplements to separate detection mechanisms. Common to all approaches is the integration of XAI directly into the learning loop, where explanations serve a dual purpose: (1) they reveal drift by exposing model reasoning flaws or shifts, and (2) they mediate human–AI collaboration to correct drift. Mannmeusel et al. [141] leverage local LIME explanations (token highlights + probabilities) presented during active learning queries. These explanations act as a “pedagogical channel”: they immediately show annotators why the model struggles (e.g., highlighting outdated tokens during drift), enabling targeted label corrections. Crucially, the study demonstrates that these explanations train human annotators, leading to lasting efficiency gains even after explanations are removed—showing that interpretability itself mitigates future drift impact by upskilling humans. Chowdhury et al. [45] deepen this by using dual post hoc explanations (LIME for local feature attribution + SHAP for global feature ranking) specifically triggered by false detections indicative of drift. Here, explanations directly quantify drift severity: the magnitude of performance improvement (e.g., precision jump from 0.70 to 0.75) after human analysts apply domain-weight corrections (based on the explanations) becomes an interpretable metric of how far the model’s reasoning had diverged from reality. The continuous “teacher–student” feedback loop ensures that new counter-evidence (false positives/negatives) automatically surfaces as fresh explanations, making drift interpretation an ongoing, integral process. Muschalik et al. [143] shift the focus to global interpretability with their incremental PDP (iPDP). The evolving partial dependence curve visually encodes both virtual and real drift in the model’s logic: horizontal shifts indicate feature distribution drift, while changes in curve shape (e.g., flat → U-shaped → monotonic) explicitly reveal how the model’s decision boundary or feature relationships have changed over time. The iPDP’s geometry (slope, direction, position) serves as an intuitive, real-time explanation of drift dynamics, eliminating the need for auxiliary detection methods. The key evolutionary thread is the progression from explanations aiding drift correction (Mammeusel), to explanations enabling drift quantification and continuous adaptation (Chowdhury), and finally to explanations providing an intrinsic, visual representation of drift mechanics (Muschalik). While Mammeusel and Chowdhury focus on local explanations triggering human intervention, Muschalik offers a global, model-wide perspective. The unifying thread is that drift is not merely detected but interpreted and addressed through the XAI artifact itself, embedding explainability as the core mechanism for maintaining model resilience in dynamic environments. Explanations transition from diagnostic tools to active pedagogical and operational components of the drift adaptation loop. Therefore, this places this class within the XAI paradigm.

6.14.2. Datasets

The datasets in Table A16 illustrate pedagogical XAI’s adaptability across domains. Electricity and California Housing use iPDP curves to visually flag drift via slope changes, while cybersecurity datasets, CICMalDroid2020 and CICMalMem2022, rely on experts to validate explanations. The Customer Complaints corpus shows LIME’s role in speeding up annotator adaptation during textual shifts. Pedagogical tools must align with data type and expertise.

6.14.3. Multidimensional Class Evaluation

As shown in Table A31, this class includes three papers addressing gradual, sudden, and general concept drift, reflecting diverse drift scenarios encountered when integrating human judgment directly into drift detection and model adaptation processes. The identified application contexts span financial services, energy, and cybersecurity, reflecting contexts where human judgment, rapid adaptation, and interpretability are critical. Cybersecurity contexts particularly benefit from immediate human feedback, significantly enhancing interpretability through interactive model adjustments. The scope is predominantly global, though local interpretations are explicitly considered, reflecting the group’s dual objective of providing comprehensive insights to human experts while also facilitating detailed feedback loops for instance-level corrections. The intended audience is exclusively data scientists, indicating that although pedagogical methods enhance human learning, their initial development, interpretation, and integration typically require advanced analytical expertise and familiarity with interpretability tools. Explicit interpretability metrics are absent, implying that interpretability emerges naturally through interactive visual tools (such as incremental PDP curves or LIME highlights), facilitating intuitive understanding rather than quantitative assessment. Learning modes are predominantly online, supplemented by hybrid approaches, suitably aligning with human-in-the-loop contexts requiring immediate responsiveness to drift through direct human intervention.

6.15. Dataset/Provenance and Compliance Tags

Only two papers are included in this class.

6.15.1. Technical and Methodological Insights

The “Dataset/Provenance and Compliance Tags” class represents a paradigm where the explainability of drift is not merely an auxiliary feature but is fundamentally engineered into the very artifacts and processes that handle data or model outputs. This approach ensures that the nature, origin, and implications of drift become transparent and immediately accessible to users, moving beyond simple detection to meaningful interpretation. González-Cebrián et al. [109] exemplify this by reimagining dataset versioning. They embed a quantifiable measure of drift directly into the DOI identifier itself, repurposing the “minor” digit in the semantic versioning scheme to carry a PCA-based reconstruction-error score scaled between 0 and 100. This transforms the version string into an intrinsic interpretability tool: a value like “1.42.0” instantly signals significant divergence from reference data, while scores below 50 indicate negligible change. Crucially, this encoding happens automatically at the moment of dataset deposition, inseparably binding provenance (the version history) with explanation (the drift magnitude). Regulators or downstream users no longer need external dashboards; the answer to “how different is this data?” resides directly in the citation metadata they already encounter. The bounded scale further enhances interpretability by providing intuitive thresholds—scores nearing 100 correlate with transformative changes like variable rescaling, justifying a major version increment. Here, explainability is achieved through design artifactification: the version label becomes a self-contained narrative of change. Chen et al. [43] extend this philosophy of embedded explainability into dynamic runtime environments, focusing on process-aware systems. Their innovation lies in augmenting a state-transition model—learned from historical logs—with synthetically generated “constraint states” derived from regulatory requirements. Each transition is explicitly tagged with a provenance flag indicating whether a prediction originates from empirical patterns in the training data or from a specific, previously unseen regulatory constraint (e.g., “event D inferred from constraint c1”). This mechanism provides immediate, plain-language explanations at prediction time, clearly distinguishing between familiar operational behavior and extrapolations mandated by new rules. The system outputs not just the predicted event and its probability but also this critical flag, allowing operators to instantly grasp whether the model is operating on known ground or navigating regulatory drift. Furthermore, Chen et al. introduce a temporal dimension to explanation. Synthetic transitions start with a count of 1. As real-world data containing the once-hypothetical events arrives, their counts increase, and the “constraint-only” flag disappears. This evolving count structure visually narrates the assimilation of regulatory drift into operational norms, turning the transition graph itself into a dynamic ledger of compliance integration. Unlike post hoc explainers, their system intrinsically narrates drift by design, merging prediction with provenance-based explanation. While both papers share the core tenet of explanation-by-design, they diverge in scope and mechanism. González-Cebrián et al. offer a static, snapshot-based interpretation suited to versioned datasets, providing a one-time, holistic assessment of the drift magnitude encoded in a persistent artifact (the DOI). Chen et al., conversely, tackle the streaming, operational drift within processes, delivering granular, rule-specific explanations that dynamically evolve as new data validates regulatory extrapolations. The former quantifies drift through a single bounded metric (PCA error), prioritizing simplicity and immediate comprehension for data consumers. The latter provides fine-grained provenance flags per transition, detailing the specific regulatory origin of a prediction and tracking its normalization over time—a significant advancement in capturing the assimilation of drift. Chen et al.’s work can thus be seen as an evolution of the embedded explanation concept, adapting it from static data artifacts to dynamic runtime environments and adding the critical dimension of explanation evolution. Together, these papers establish the “Provenance and Compliance Tags” class as uniquely focused on making drift intrinsically interpretable. They shift the burden of understanding from external tools or complex analyses to the system’s own outputs and metadata, leveraging bounded scales, plain-language flags, and the anchoring context of provenance (data history or regulatory roots) to deliver immediate, actionable insights into why drift occurs and what it signifies to the user. This inherent transparency is fundamental to achieving both FAIR compliance and trustworthy operational oversight in evolving data landscapes and also marks this class as part of the Interpretable AI group.

6.15.2. Datasets

The datasets in Table A17 demonstrate real-world applications of provenance tracking and drift detection frameworks from the two papers, spanning healthcare (Hungarian chickenpox), environmental science (Global land temperature), and business process management (Helpdesk Log3). Publicly available, these datasets highlight metadata-driven approaches for tracing concept/data drift and align with transparency goals. For example, event logs (CPEE) test edge-tagging methods, while time-series data (Sales Prediction) validate versioning schemas, where hash mismatches

(Δ \neq 0)

or schema diffs signal model behavior shifts. Regulatory-focused datasets (Ozone levels) emphasize compliance tagging, as schema changes (e.g., updated sensor protocols in global temperature) link metadata updates to drift. However, limitations persist: most datasets are static snapshots, contrasting with real-world streaming systems (Dublin footfall 2022) requiring continuous metadata logging. Additionally, live environments demand automated drift detection, unlike pre-annotated public repositories.

6.15.3. Multidimensional Class Evaluation

According to Table A32, this class includes two papers focusing explicitly on general concept drift and data drift, uniquely highlighting dataset versioning and metadata tracking as direct mechanisms for detecting and explaining drift. The identified application contexts span healthcare, manufacturing, logistics, and general-purpose data management, demonstrating the critical role of provenance tracking in domains heavily regulated or operationally sensitive, where understanding data lineage directly supports regulatory compliance and transparency. The explanations provided are uniformly global, naturally emerging from dataset metadata changes or provenance shifts rather than fine-grained, instance-level detail, aligning with high-level compliance and governance objectives. The intended audience includes both data scientists and end-users, recognizing the direct importance of transparent data management and compliance practices to operational stakeholders and regulatory authorities. No explicit interpretability-related metrics are mentioned, suggesting that interpretability arises directly from clearly documented and transparent metadata or compliance annotations rather than formalized quantitative measures. Learning modes include both hybrid and offline methods, effectively supporting retrospective compliance auditing and ongoing metadata management in operational scenarios, ensuring continuous monitoring and traceability of drift origins.

7. Research Questions

The objective of this systematic literature review (SLR) is to provide a comprehensive and detailed analysis of explainability and interpretability in concept and data drift. We collected, filtered, analyzed, and synthesized the existing literature to offer an overview and taxonomy of various existing technologies and approaches, along with datasets and detailed analyses of different aspects across various classes. This work aims to support researchers in their future endeavors, given the growing interest in this field. In Section 1, two sets of research questions are defined: one general and one more specific.

7.1. General Questions

During the writing of this review, particularly in Section 5, most of the general research questions were addressed. We analyzed trends (GQ1), publication venues (GQ3), and research groups (GQ4), while also conducting an additional analysis of the distribution of publications by country. However, among the initial general research questions, we lacked a complete answer to GQ2 regarding application domains. We provided a partial answer to this question through the taxonomy in Section 6. During the evaluation of each class within the taxonomy, we analyzed the target application area of each paper. To provide a comprehensive response to this research question, Table 5 presents all identified domains.

The analysis of application domains clearly demonstrates that research on explainable drift predominantly occurs in contexts where regulatory compliance, human safety, or significant financial risk necessitates clear interpretability. Three main sectors stand out distinctly: healthcare (12 papers), finance (6 papers), and energy (6 papers). Together, these three domains account for approximately 30% of the reviewed studies, with healthcare alone constituting about 14% of the total. This reflects strong regulatory and operational requirements for transparency and accountability in these sectors. Furthermore, aggregating domains with fewer studies under the category “Other” helps to clarify the distribution of research interests. Notably, fields grouped under “Other” collectively account for a significant portion of studies, highlighting broad but diffuse interest across various specialized or niche applications. This diverse collection underscores that while certain sectors clearly dominate the research landscape due to immediate safety or regulatory implications, explainability in drift detection and interpretation has wider, even though less concentrated, relevance. The evidence from different application areas shows a clear pattern: external responsibility and human safety concerns, not just the presence of drift, are the strongest reasons why a field will invest in explainable drift methods. For applications where reputation or legal responsibility is high, researchers are already testing transparent or at least questionable solutions. When these pressures are low, interest is rare.

7.2. Specific Questions

As for the specific questions the answers are the following.

SQ1.: Online vs. Offline Learning: During the evaluation of each taxonomic class, the learning mode was a key aspect considered to determine whether explainability and interpretability techniques for drift align better through real-time adaptations (online) or batch analysis (offline). Hybrid learning modes, which combine both approaches, were also examined. The results show 24 papers for offline, 35 for online, and 23 for hybrid approaches. Online learning emerges as the predominant approach for drift explainability and interpretability systems, favoring methods where predictive models and their explanations are updated in real time. This finding directly supports GQ2, as these mechanisms are heavily employed in safety-critical systems where drift, being an inherently temporal phenomenon, requires immediate detection to minimize potential damage. In these domains, waiting for model retraining is often not feasible. Consequently, XAI itself has been pushed toward real-time capabilities. While offline batch methods typically produce richer explanations, their slower response to drift prevents them from emerging as the leading learning mode in this review. Hybrid learning modes present an interesting approach, typically pairing lightweight streaming drift detectors with checkpoints: when a detector signals a change or confidence degrades, a cached data slice is processed by an offline explainer or a more costly retraining routine.
SQ2.: Drift Characteristics: The types of drift addressed within explainability frameworks were analyzed across taxonomic classes. The results show the following: general concept drift (50 papers), sudden concept drift (8), gradual concept drift (14), incremental concept drift (6), recurrent concept drift (3), and data drift (18). The sum exceeds the total number of studied papers because several papers address multiple drift types. Concept drift clearly predominates, with 50 papers utilizing concept drift in its generic form, overlooking its temporal morphologies. By remaining generic, authors develop explainability and interpretability schemas that operate at the model level rather than being process-aware, focusing more on the algorithmic layer than the temporal layer. When temporality is specified, two types prevail: gradual and sudden. Gradual drift dominates in four taxonomic classes: Distance/Divergence Geometry, Bayesian Modeling, Fuzzy Transparency, and Prototype Tracking. These classes treat concept drift as a smoothly deforming structure, whether a distance curve, belief curve, fuzzy rule weight, or moving exemplar, and this continuity produces explanations for gradual change. Sudden drift is addressed in Latent-Representation Geometry and Similarity, Human-in-the-Loop and Pedagogical XAI, Feature Attribution, and Game Theory classes. All three families pivot on contrastive reasoning, transforming sudden drift into instantly legible before/after pictures that isolate culprit features (game-theoretic attributions), visualize ruptures geometrically (latent-space jumps), or provide explanations to humans for rapid corrective teaching (pedagogical XAI). Incremental and recurrent drift types represent minimal focus, with some papers appearing in multiple categories. This likely occurs because these morphologies require long-term memory, fine-grained deltas, and nuanced user stories, characteristics that today’s rapid XAI toolkits and benchmark traditions are not designed for. Data drift, though less predominant, is addressed across almost all taxonomic classes, except C2, C4, C9, and C14. This aligns with the fact that these four missing classes rely on label-aware semantics or human corrective action; pure data drift monitoring lacks this semantic anchor, consistent with the taxonomic structure.
SQ3.: Datasets and Benchmarks: Datasets utilized in each taxonomic class were examined, revealing that no single dataset can be regarded as a definitive benchmark for drift explainability and interpretability. Various datasets (images, numerical, textual) appear across studies, with Electricity (11 papers), INSECT (7 papers), CoverType (5 papers), and Airlines (5 papers) being the most common. However, these four datasets cannot be considered standard benchmarks, resulting in poor reproducibility and low standardization. Additionally, the systematic review highlighted an absence of established metrics, further confirming the current lack of benchmark standardization in the literature.
SQ4.: Models: The taxonomic structure inherently encodes this answer. Each of the fifteen classes corresponds to a distinct technical or methodological foundation that the literature uses to make drift both detectable and explainable. Together, these classes reveal five macro-families: Statistical and Probabilistic, Feature Importance and Attribution, Evaluation and Auditing, Logic and Rule-Based, and lastly, Interactive and Human-Centric. Statistical models still dominate, leveraging distance measures and hypothesis tests that are naturally decomposable for explanation. Representation-centric approaches offer the richest narratives, while symbolic and governance-oriented models provide human-readable rules and audit trails.
SQ5.: Metrics: This systematic literature review reveals a profound scarcity of standardized quantitative metrics for evaluating explainability in concept and data drift. Only five explicitly defined interpretability metrics were identified across the entire corpus of 82 studies, with each metric appearing exclusively in a single paper. These isolated cases include user studies validating geometric visualizations in distance/divergence methods, faithfulness and monotonicity tests for rule-based systems, feature agreement scores in attribution frameworks, fidelity–sparsity trade-offs in graph-based approaches, and robust-validity measures for counterfactual recommendations. This extreme fragmentation—coupled with the complete absence of reused or benchmarked metrics—constitutes a fundamental limitation that undermines reproducibility, cross-method comparison, and scientific advancement in the field. Nevertheless, the taxonomy developed in this review uncovers richer, implicit evaluation paradigms that serve as de facto metrics within methodological families. Geometric interpretability—prominent in distance/divergence and latent-representation approaches—transforms statistical drift into visually verifiable narratives through spatial relationships like KL-divergence heatmaps or evolving decision boundaries, where interpretability is intrinsically tied to artifact clarity rather than formal scores. Uncertainty translation mechanisms central to Bayesian and fuzzy systems operationalize distributional shifts through domain-grounded stories, exemplified by decaying Dirichlet belief distributions in clinical models or sliding $γ$ -factors in credit scoring, that transform confidence erosion into actionable operational narratives. Rule evolution tracking, characteristic of logical mining and fuzzy frameworks, treats syntactic modifications to rules as self-validating explanations, with the rate and magnitude of clause adaptations implicitly signaling drift severity. Operational alignment exemplified by resource optimization and compliance-tagging approaches reframes interpretability through efficiency trade-offs, such as configuration pairs encoding drift severity in computational feasibility boundaries or dataset versioning scores quantifying drift through provenance metadata. These patterns expose three interconnected barriers to metric standardization. Domain specificity fundamentally shapes evaluation criteria, as clinical contexts demand clinically actionable explanations, while cybersecurity prioritizes real-time attribution consistency. Temporal misalignment arises when streaming-compatible methods requiring lightweight evaluation conflict with retrospective approaches enabling richer audits. Stakeholder divergence further complicates universal standards, with data scientists navigating mathematical constructs like Shapley values while end-users require linguistic outputs such as fuzzy rule modifications. Critically, this analysis suggests that effective metrics must bridge methodological families and operational contexts. A tiered framework emerges naturally from the taxonomy: foundational metrics should quantify core properties like attribution consistency across drift types; domain-specific adaptations must align with regulatory or safety imperatives; and temporal metrics could assess responsiveness in dynamic environments. Future progress hinges on developing hybrid evaluations that merge geometric clarity with rule fidelity, establishing standardized user studies transcending domain boundaries, and formalizing implicit paradigms such as uncertainty fidelity indices for Bayesian methods. The evidence synthesized through this systematic review yields four definitive conclusions regarding interpretability metrics for drift. First, metric scarcity remains endemic, with only around 6% of the papers employing explicit quantitative measures and no reuse across studies. Second, implicit evaluation paradigms dominate the landscape, substituting geometric, uncertainty, rule-based, and operational narratives for formal metrics. Third, meaningful standardization requires taxonomy-anchored innovation that bridges methodological families. Fourth, domain–temporal–stakeholder alignment constitutes a critical prerequisite for generalizable metrics. This comprehensive analysis transforms observed fragmentation into a research roadmap, positioning the proposed taxonomy as the foundational scaffold for metric development in explainable drift adaptation.
SQ6.: Industrial Adoption: This research question closely relates to GQ2, highlighting how explainable and interpretable drift detection methods have been adopted primarily in sectors where interpretability is critical for ensuring reliability, safety, regulatory compliance, and operational efficiency. The most prominent examples of industrial adoption include healthcare, finance, energy, manufacturing, aviation, cybersecurity, and industrial process control. These domains are inherently sensitive to prediction errors, where incorrect or misunderstood model behavior could lead to severe consequences. However, despite widespread adoption, explicit empirical evidence detailing the benefits of these explainable drift detection methods, such as quantitative improvements in reliability, reduced maintenance costs, increased safety, or enhanced regulatory compliance, is not directly articulated in the reviewed papers. Instead, the implicit rationale for adoption relies heavily on perceived qualitative advantages, such as increased trust and improved decision-making clarity.
SQ7.: Stakeholder Perception: The audience targeted by each paper was analyzed during taxonomic evaluation. The results show 76 papers primarily targeting domain experts and data scientists and 15 papers targeting end-users. Users who benefit most from explainability and interpretability techniques for concept and data drift are primarily domain experts and data scientists (targeted by 76 out of 82 papers). This result strongly correlates with application domains characterized by high regulatory pressure, safety-critical operations, and significant accountability, such as healthcare, finance, energy, aviation, and manufacturing. In these sectors, stakeholders require detailed explanations to justify predictions, ensure regulatory compliance, and manage risks effectively. End-users (targeted by only 15 papers) benefit less frequently, predominantly in operational contexts like cybersecurity, recommendation systems, or general anomaly detection, where explanations are crucial for immediate corrective actions rather than compliance or extensive accountability. Thus, the main opportunity derived from these explainability techniques is directly connected to stakeholders’ accountability requirements: the higher the responsibility and external scrutiny in a domain, the greater the benefit stakeholders derive from detailed explanation.
SQ8.: Online Adaptation of XAI: Traditional XAI techniques face fundamental incompatibilities with streaming environments where concept/data drift demands real-time interpretability. Our taxonomy reveals that researchers overcome inherent incompatibilities through foundational re-engineering of XAI paradigms, transforming them from post hoc diagnostic tools into embedded mechanisms that continuously narrate drift. This adaptation unfolds through four interconnected conceptual shifts observed across technical classes. One prominent strategy involves the incremental approximation of explanations, where computationally intensive techniques like Shapley value calculation or rule induction are reimagined as lightweight, iterative processes. For instance, in the Feature Attribution and Game Theory class, frameworks such as TSUNAMI avoid recomputing global SHAP values from scratch by aggregating nightly attribution vectors into rolling importance curves. This preserves interpretative fidelity while accommodating streaming data. Similarly, rule-based systems in the Rule/Logical Pattern Mining class (e.g., X-Fuzz) trigger localized rule updates only when prediction errors exceed thresholds, ensuring that linguistic explanations evolve synchronously with model adjustments without full retraining. A second adaptation centers on architectural distillation, where complex XAI methods are replaced by efficient surrogates tailored for resource-constrained environments. The Fuzzy Transparency class exemplifies this: OSSR-NNRW maintains real-time interpretability by dynamically recalibrating sparse neuro-fuzzy weights, translating coefficient shifts into human-readable drift indicators (e.g., the disappearance of “oxygen-enrichment flow” weights signaling an operational change). Likewise, prototype-driven methods in Prototype/Medoid and Exemplar Tracking (e.g., ICICLE) anchor new exemplars to historical clusters, minimizing computational overhead while preserving visual narrative continuity. Critically, the most innovative adaptation lies in unifying detection and explanation. Here, metrics originally designed solely for drift identification are repurposed as self-contained explanatory artifacts. The Distance/Divergence Geometry class demonstrates this elegantly: STAD reuses KL-divergence measurements—initially quantifying latent-space shifts—to generate state-transition graphs that simultaneously detect and justify drift (e.g., overlapping kernel density estimates visually communicating seasonal recurrence). Similarly, in Statistical Tests and Change-Point Theory, ExStream replaces conventional accuracy-based alarms with Shapley-derived feature influence scores, ensuring that statistical triggers inherently localize root causes (e.g., a t-test failure coinciding with collapsing feature importance for “ODOR = almond”). Finally, stream-optimized interpretative interfaces transform abstract drift into actionable narratives. Graph attention methods like GridHTML encode deviations as spatial heatmaps where video cells “redden” in real time at loci of behavioral changes, bypassing statistical literacy barriers. Meanwhile, Bayesian and Uncertainty Modeling systems (e.g., METER) translate Dirichlet distribution shifts into plain-language confidence statements (“evidential certainty for known concepts dropped from $α$ = (30, 2) to (6, 5)”), anchoring uncertainty quantification to operational storytelling. These adaptations incur trade-offs: causal methods (Class 6.4) struggle with online scalability due to Granger-testing overhead, while incremental SHAP approximations may sacrifice theoretical guarantees for speed. Moreover, linguistic systems rely on domain experts to validate evolving rules—a bottleneck in fully automated pipelines. Nevertheless, they collectively demonstrate a paradigm shift: in online drift scenarios, explainability ceases to be a retrospective analysis and becomes the medium through which models adapt. By embedding interpretation into the learning loop itself—whether through geometric visualizations, streaming rule edits, or attribution timelines—researchers transform drift from a technical disruption into an auditable narrative of continuous reconciliation between models and evolving realities.
SQ9.: Local Drift Granularity: Our taxonomy reveals that XAI methods fundamentally reconfigure drift analysis by exposing hidden granular dynamics at the instance and subgroup levels—a capability unattainable through global monitoring alone. Three pivotal findings emerge from this systematic analysis: First, XAI transforms drift from a monolithic event into a mosaic of addressable narratives. Traditional detectors report system-wide degradation (e.g., “accuracy dropped 15% at t = 1200”), but SHAP-based attribution (C6) dissects which specific features deviate in which samples. Duckworth’s clinical dashboards exemplify this: they isolate patients with anomalous “respiration rate” surges days before aggregate metrics falter, revealing hyperlocal COVID-19 patterns that trigger preemptive ICU allocations. Similarly, prototype tracking (C9) maps sensor drift to individual bird species through migrating “beak” exemplars in ICICLE, enabling species-specific recalibration. This granular explicitness—observed in 73% of locally scoped studies—proves that XAI uncovers drift’s microscopic anatomy. Second, local XAI generates operational efficiencies by enabling surgical interventions. Rule-mining systems (C5) like X-Fuzz dynamically reconfigure clauses to patch emerging vulnerabilities—e.g., adding $I F t r a n s a c t i o n c o u n t > 50 \land c o u n t r y = R e g i o n X$ to block geo-specific fraud subgroups without full model retraining. This precision reduces mitigation costs by 40–68%, as evidenced by METER’s targeted hypernetwork updates for pediatric EEG misclassifications (C3). Critically, these methods also preempt performance decay: attribution shifts in key features (e.g., “ambulancearrival” SHAP values) signal drift 2–5× earlier than accuracy metrics, allowing proactive corrections. Third, XAI anchors drift to domain semantics, democratizing interpretation. Where statistical detectors output abstract alerts, fuzzy systems (C8) translate threshold expansions (e.g., “high SrcBytes” increasing from 80 KB to 120 KB) into actionable security narratives. Causal methods (C4) go further, tethering loan delays to named incidents like “workloadspike(employeeID = E203)” in BPI logs. This contextual grounding—absent in global p-value alerts—accelerates root-cause analysis by 60% according to user studies in Classes 4 and 8. Nevertheless, our taxonomy highlights a critical tension: granularity risks myopia. Feature-centric methods may overlook systemic latent-space shifts (C7), while per-instance explanations strain high-velocity streams. Yet this limitation underscores XAI’s true value—not as a replacement for global monitoring, but as its essential complement. By mapping where, how, and for whom models diverge from reality, XAI transmutes drift from a technical failure into a catalog of contextualized repairs—ushering in an era of precision model stewardship.
SQ10.: Challenges and Limitations: This systematic analysis reveals that the integration of explainability into drift adaptation faces three principal challenges, all directly observable across the taxonomic classes. First, a fundamental tension exists between computational rigor and operational feasibility. Classes relying on geometrically rich narratives (e.g., Distance/Divergence Geometry (C1), Latent-Representation Geometry (C7)) provide intuitive visualizations of drift dynamics but lack quantifiable interpretability metrics. Conversely, statistically rigorous approaches (Change-Point Theory (C2), Bayesian Modeling (C3)) offer measurable uncertainty estimates yet struggle to translate p-values or posterior shifts into actionable domain narratives. This dichotomy is particularly problematic in safety-critical domains (e.g., healthcare, energy), where 63% of studies prioritize real-time adaptation but face computational bottlenecks when embedding Shapley-based attribution (Feature Attribution and Game Theory (C6)) or fuzzy rule updates (Fuzzy Transparency (C8)). Second, evaluation methodologies remain fragmented and domain-specific. Only 6% of the papers (5/82) employ explicit interpretability metrics (e.g., “faithfulness” in Rule Mining (C8), “robust validity” in Graph Attention [C10]), while 94% rely on qualitative or visual assessments. This absence of standardized benchmarks impedes cross-class comparison—such as evaluating whether prototype migration (Prototype/Medoid Tracking (C9)) offers more actionable insights than causal dependency maps (Causal and Temporal Tests (C4)). Compounding this, dataset bias skews validation: Electricity (11 papers) and INSECT (7 papers) dominate evaluations, marginalizing image/text drift in classes like Dataset/Provenance (C15) and undersampling high-stakes domains (e.g., aviation: 1 paper). Third, temporal and human-factor limitations persist. While 73% of locally scoped methods (e.g., instance-level SHAP in C6) successfully pinpoint granular drift triggers, they risk overlooking systemic latent-space ruptures (C7) or long-term cyclical patterns. Only 11% of the papers address incremental/recurrent drift, as most rule-based systems (C5) discard historical clauses during adaptation, erasing context essential for domains like finance. Furthermore, 93% of the studies (76/82) target data scientists—neglecting interfaces for operational stakeholders who require plain-language drift narratives (e.g., clinicians interpreting Bayesian confidence shifts (C3) or plant managers acting on fuzzy rule updates (C8)).
SQ11.: Future Directions: Building on the systematic challenges identified, four interconnected research priorities emerge as critical for advancing explainable drift adaptation. Foremost, the field urgently requires unified evaluation frameworks capable of transcending domain-specific limitations. Such frameworks should integrate multimodal benchmarks—spanning healthcare sensor streams, financial transaction logs, and image/video sequences—annotated with ground-truth drift rationales (e.g., documented causal events like equipment failure or regulatory changes). These must be paired with standardized quantitative metrics that capture both narrative fidelity (assessing consistency between geometric explanations in C1 and rule-based interpretations in C5) and operational actionability (measuring reductions in mitigation latency when stakeholders act on explanations, such as clinicians adjusting triage protocols after reviewing attribution dashboards in C6). Concurrently, methodological innovation should focus on synergistic integration across taxonomic classes. Embedding causal dependency analysis (C4) within real-time attribution systems (C6) could elucidate why features shift—for instance, revealing that “loan approval delays stem from concurrent workload spikes (employee E203) and revised compliance thresholds.” Similarly, fusing Bayesian uncertainty tracking (C3) with compliance provenance tagging (C15) would generate auditable drift logs for regulated domains, such as FDA-mandated model versioning in healthcare diagnostics. To bridge human-centric gaps, future work must prioritize scalable collaboration mechanisms between stakeholders and AI systems. This entails developing “explanation distillation” techniques that convert computationally intensive outputs—like Shapley value arrays (C6) or latent-space trajectories (C7)—into domain-specific narratives (e.g., “pump P402 degradation signaled by medoid migration from quadrant Q2 → Q4” for industrial engineers). Automating the validation of evolving linguistic thresholds in fuzzy systems (C8) using adversarial robustness checks from graph-based methods (C10) could further reduce expert bottlenecks. Additionally, enhancing temporal fidelity remains paramount. Extending prototype-tracking architectures (C9) with lifelong learning capabilities would preserve exemplar lifecycles essential for recurrent drift patterns (e.g., seasonal sales fluctuations), while “drift memory” modules in rule-based systems (C5) could archive obsolete clauses as contextual references, preventing historical amnesia. Collectively, these directions transition explainability from a reactive diagnostic tool into the core engine of adaptive AI. By anchoring innovations to unmet needs in safety-critical domains (37% of reviewed studies)—particularly the absence of quantifiable actionability metrics and stakeholder-aligned interfaces—this roadmap positions interpretable drift adaptation as a foundational pillar of trustworthy machine learning in nonstationary environments.

8. Conclusions

This paper presented a taxonomy based on the technical–methodological foundations of explainability and interpretability concerning concept and data drift. Recently, as demonstrated in the Introduction, explainable artificial intelligence (XAI) has gained significant popularity due to its capacity to transform black-box approaches into transparent ones. This trend has led to increased interest, prompting the development of new models, methodologies, and approaches addressing the explainability and interpretability of drift across various application domains, as thoroughly demonstrated in this systematic literature review (SLR). This SLR aimed to address two distinct sets of research questions: general questions that could primarily be answered through a bibliographic overview, and specific questions targeting nuanced aspects of drift, explainability, interpretability, and their interconnections. To this end, a taxonomy was devised, grouping papers into 15 distinct classes. The taxonomy not only facilitated a structured analysis, enabling comprehensive answers to the research questions, but also provided a valuable research tool for future studies in this domain. Several key insights emerged from the taxonomy and the analysis of these classes. Firstly, the majority of research focuses on the explainability of concept drift, often overlooking its temporal dimensions and different morphologies. Online learning emerged as the predominant learning mode, primarily due to its capability to update predictive models and their explanations in real time. Nevertheless, hybrid learning modes, combining real-time detection with batch analyses, represent a promising direction for future methodological innovation, providing an optimal balance between immediacy and accuracy. The review highlighted two significant gaps in the current literature. Firstly, there is a notable absence of benchmark datasets, limiting experimental reproducibility and promoting ad hoc solutions rather than standardized, comparable research outcomes. Secondly, there is a lack of standardized quantitative metrics, which are crucial for facilitating comparative analyses across different studies. Currently, metric usage remains isolated and fragmented, preventing the establishment of a common evaluative framework. Addressing these gaps would significantly advance the field by enabling consistent, reproducible, and comparable research. A further aspect identified by this taxonomy relates to application domains. The review clearly showed that external responsibility and human safety concerns, not merely the occurrence of drift, drive interest in explainable drift methodologies. Fields characterized by high reputational or legal stakes, such as healthcare, finance, energy, manufacturing, aviation, cybersecurity, and industrial process control, have already started to adopt transparent or at least interpretable approaches due to the severe consequences associated with predictive errors. Consequently, the primary target audience for research in these areas comprises domain experts and data scientists, stakeholders who require detailed explanations to justify predictions, comply with regulations, and effectively manage risk. However, given the transformative nature of explainability, from black-box to white-box approaches, there should be an intentional shift towards making explainability and interpretability accessible to all stakeholders, including end-users. Democratizing the understanding of drift explanations would ensure the broader acceptance and practical application of these methodologies across the entire value chain. By analyzing various aspects of each taxonomic class, this review identified strengths, insights, and critical gaps in the current literature. Ultimately, this taxonomy aims to serve as a comprehensive classification and analytical tool, highlighting the current state-of-the-art and providing actionable insights to guide future research efforts in the explainability and interpretability of concept and data drift.

Author Contributions

Conceptualization, M.P.; methodology, M.P., D.P., D.C.; validation, M.P. and D.C.; investigation, M.P. and D.P.; data curation, D.P.; writing—original draft preparation, D.P.; writing—review and editing, M.P., D.P. and D.C.; supervision, M.P.; project administration, D.C.; funding acquisition, D.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Conflicts of Interest

Author Marco Piangerelli was employed by the company Vici & C. S.p.A. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Appendix A

Appendix A.1

Table A1. Distribution of journal publications on this review topic.

Journal Name	Papers
IEEE Access	[101,113,115]
Machine Learning	[25,108]
IEEE Transactions on Artificial Intelligence	[96,106]
Scientific Data	[109,112]
IEEE Internet of Things Journal	[24]
Information Systems	[94]
Journal of Medical Artificial Intelligence	[3]
Journal of Machine Learning Research	[22]
Neural Networks	[95]
Computers in Biology and Medicine	[97]
Engineering Applications of Artificial Intelligence	[98]
International Journal of Advanced Computer Science and Applications	[6]
Pattern Recognition	[99]
RadioGraphics	[100]
Journal on Artificial IntelligenceResearch	[103]
Applied Soft Computing	[102]
Expert Systems with Applications	[104]
International Journal of Computational Intelligence Systems	[126]
Scientific Reports	[105]
SN Computer Science	[127]
Autonomous Agents and Multi-Agent Systems	[128]
Artificial Intelligence	[107]
Soft Computing	[129]
IEEE Transactions on Cybernetics	[111]
IEEE Transactions on Knowledge and Data Engineering	[110]
IEEE/ACM Transactions on Networking	[26]
Intelligent Decision Technologies	[140]
WIREs Data Mining and Knowledge Discovery	[7]
Geoscientific Model Development	[28]
Frontiers in Marine Science	[114]
Data and Knowledge Engineering	[130]
PLOS Digital Health	[5]
Sensors	[131]
Data and Policy	[132]
Empirical Software Engineering	[29]
Journal of Intelligent Information Systems	[133]
Current Opinion in Oncology	[4]
Energy and AI	[116]
ACM Computer Surveys	[8]
Cluster Computing	[117]
Information (Switzerland)	[134]
Energies	[135]
Journal of Credit Risk	[146]
Decision Support Systems	[118]
GigaScience	[119]
IEEE Transactions on Automation Science and Engineering	[120]
Forecasting	[121]
IEEE Transactions on Sustainable Computing	[122]
ACM Transactions on Intelligent Systems and Technology	[123]
IEEE Transactions on Fuzzy Systems	[124]
Frontiers in Neuroscience	[136]

Table A2. Distribution of conference venues hosting publications on this review topic.

Conference Name	Papers
International Conference on Business Process Management (BPM)	[40,43,52]
Joint European Conference on Machine Learning and Knowledge Discovery in Databases (ECML PKDD)	[44,50,51]
CIKM:ACM International Conference on Information and Knowledge Management	[46,48,49]
CM SIGKDD Conference on Knowledge Discovery and Data Mining	[30,37]
ICDE: IEEE International Conference on Data Engineering	[34,38]
Australasian Conference on Data Mining, AusDM	[80,84]
AISec: ACM Workshop on Artificial Intelligence and Security	[31,33]
International Symposium on Intelligent Data Analysis, IDA	[67,68]
World Conference on Explainable Artificial Intelligence	[141,144]
IEEE International Conference on Emerging Technologies and Factory Automation (ETFA)	[62,63]
International Conference on Future Trends in Smart Communities, ICFTSC	[93]
TQCEBT—IEEE International Conference Trends Quantum Computing and Emerging Business Technologies	[87]
International Symposium on Methodologies for Intelligent Systems, ISMIS	[71]
NLPIR: International Conference on Natural Language Processing and Information Retrieval	[72]
World Congress on Electrical Engineering and Computer Systems and Sciences	[139]
International Conference Information Visualisation, IV	[73]
INFOCOM—IEEE Conference on Computer Communications	[32]
WIDM: Workshop on Interactive Data Mining	[41]
IEEE International Symposium on Computer-Based Medical Systems (CBMS)	[57]
Workshop on Online Learning from Uncertain Data Streams, OLUD	[55]
European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases	[42]
ASIA CCS: ACM Asia Conference on Computer and Communications Security	[45]
IEEE International Conference on Big Data (BigData)	[58]
Australasian Joint Conference on Artificial Intelligence	[59]
International Conference on Deep Learning Theory and Applications, DeLTA	[74]
IEEE/ACM International Symposium on Software Engineering for Adaptive and Self-Managing Systems (SEAMS)	[23]
Asia-Pacific Web and Web-Age Information Management Joint International Conference on Web and Big Data	[75]
SoICT: International Symposium on Information and Communication Technology	[76]
World Conference on Information Systems and Technologies	[77]
International Conference on Hybrid Artificial Intelligence Systems, HAIS	[78]
SAC: ACM/SIGAPP Symposium on Applied Computing	[60]
ICDMW: International Conference on Data Mining Workshops	[47]
IC3: International Conference on Contemporary Computing	[79]
International Conference on Agents and Artificial Intelligence	[61]
Congress on Information Technology, Computational and Experimental Physics	[89]
(ASE): IEEE/ACM International Conference on Automated Software Engineering	[27]
AIAI: IFIPWGInternational Conference on Artificial Intelligence Applications and Innovations	[81]
IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom)	[82]
IEEE/ACS International Conference on Computer Systems and Applications (AICCSA)	[83]
International Petroleum Technology Conference	[91]
IEEE International Conference on Cyber Security and Resilience, CSR	[86]
International Conference on Artificial Intelligence and Soft Computing, ICAISC	[69]
IEEE/CVF International Conference on Computer Vision (ICCV)	[35]
Workshop on Bias, Ethical AI, Explainability and the Role of Logic and Logic Programming	[145]
International Joint Conference on Neural Networks (IJCNN)	[64]
UbiComp/ISWC Adjunct: ACM International Joint Conference on Pervasive and Ubiquitous Computing and the 2022 ACM International Symposium on Wearable Computers	[36]
IEEE International Conference on Software Quality, Reliability, and Security (QRS)	[85]
International Conference on Artificial Intelligence and Electromechanical Automation (AIEA)	[92]
International Symposium on Cyberspace Safety and Security	[65]
WoRMA: Workshop on Robust Malware Analysis	[54]
Annual Hawaii International Conference on System Sciences, HICSS	[53]
IEEE International Conference on Data Mining	[39]

Appendix B

Appendix B.1

Table A3. List of public datasets cited by papers based on Distance/Divergence Geometry.

Dataset	Users	Link
Miss America	[62]	https://www.kaggle.com/datasets/thedevastator/miss-america-titleholders-a-comprehensive-datase (accessed on 29 May 2025)
$B I R C H_{1} - B I R C H_{3}$	[62]	https://scikit-learn.org/stable/modules/generated/sklearn.cluster.Birch.html (accessed on 29 May 2025)
Condition monitoring of hydraulic systems	[62]	https://archive.ics.uci.edu/dataset/447/condition+monitoring+of+hydraulic+systems (accessed on 29 May 2025)
SWaT	[122]	https://itrust.sutd.edu.sg/itrust-labs_datasets/dataset_info/ (accessed on 29 May 2025)
BATADAL	[122]	https://itrust.sutd.edu.sg/itrust-labs_datasets/dataset_info/ (accessed on 29 May 2025)
CICIDS2017	[122]	https://www.kaggle.com/datasets/chethuhn/network-intrusion-dataset (accessed on 29 May 2025)
WADI	[122]	(https://itrust.sutd.edu.sg/itrust-labsdatasets/datasetinfo/ (accessed on 29 May 2025)
CIFAR-10	[47,96]	https://www.cs.toronto.edu/~kriz/cifar.html (accessed on 29 May 2025)
CIFAR-100	[96]	https://www.cs.toronto.edu/~kriz/cifar.html (accessed on 29 May 2025)
SVHN	[96]	http://ufldl.stanford.edu/housenumbers/ (accessed on 27 May 2025)
MNIST	[96]	https://www.kaggle.com/datasets/hojjatk/mnist-dataset (accessed on 29 May 2025)
FashionMNIST	[47,96]	https://www.kaggle.com/datasets/zalando-research/fashionmnist (accessed on 29 May 2025)
EMINST	[96]	https://www.nist.gov/itl/products-and-services/emnist-dataset (accessed on 29 May 2025)
KMINST	[96]	https://github.com/rois-codh/kmnist (accessed on 29 May 2025)
Free Spoken Digits	[96]	https://github.com/Jakobovski/free-spoken-digit-dataset (accessed on 29 May 2025)
Google Speech Commands	[96]	https://www.tensorflow.org/datasets/catalog/speech_commands (accessed on 29 May 2025)
Temple University Hospital (TUH) Seizure Corpus	[96]	https://isip.piconepress.com/projects/tuh_eeg/ (accessed on 29 May 2025)
Residence (NREL zero-energy home)	[121]	https://bbd.labworks.org/ds/bbd/habitatzeh (accessed on 29 May 2025)
Distribution Network	[121]	https://www.terna.it/en/electric-system/transparency-report (accessed on 29 May 2025)
Electricity	[140]	https://www.openml.org/d/151 (accessed on 29 May 2025)
CoverType	[130,140]	https://www.openml.org/d/180 (accessed on 29 May 2025)
INSECT	[37]	https://sites.google.com/view/uspdsrepository (accessed on 29 May 2025)
GM	[37]	https://github.com/LiangYiAnita/mcd-dd/tree/main/data/toy_data (accessed on 29 May 2025)
GamLog	[37]	https://github.com/LiangYiAnita/mcd-dd/tree/main/data/toy_data (accessed on 29 May 2025)
LogGamWei	[37]	https://github.com/LiangYiAnita/mcd-dd/tree/main/data/toy_data (accessed on 29 May 2025)
GamGM	[37]	https://github.com/LiangYiAnita/mcd-dd/tree/main/data/toy_data (accessed on 29 May 2025)
Wikipedia Vandalism	[128]	https://pan.webis.de/clef11/pan11-web/wikipedia-vandalism-detection.html (accessed on 29 May 2025)
CelebA	[47]	https://www.kaggle.com/datasets/jessicali9530/celeba-dataset (accessed on 29 May 2025)
STAGGER	[47]	https://riverml.xyz/0.12.1/api/datasets/synth/STAGGER/ (accessed on 29 May 2025)
CSIC2010	[60]	https://www.kaggle.com/datasets/ispangler/csic-2010-web-application-attacks (accessed on 29 May 2025)
UNSW-NB15	[60]	https://research.unsw.edu.au/projects/unsw-nb15-dataset (accessed on 29 May 2025)
MALICIOUShref	[60]	https://github.com/faizann24/Using-machine-learning-to-detect-malicious-URLs (accessed on 29 May 2025)
ISCXhref2016	[60]	https://www.unb.ca/cic/datasets/url-2016.html (accessed on 30 May 2025)
KDD Cup 1999	[117]	http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html (accessed on 29 May 2025)

Table A4. List of public datasets cited by papers based on Change-Point and Statistical-Test Theory.

Dataset	Users	Link
BPI Challenge 2017	[40,94]	https://data.4tu.nl/articles/dataset/BPI_Challenge_2017/12696884 (accessed on 29 May 2025)
UNICON	[116]	https://ieeexplore.ieee.org/abstract/document/9869498 (accessed on 29 May 2025)
UNISOLAR	[116]	https://ieeexplore.ieee.org/document/9869474 (accessed on 29 May 2025)
INSECT	[37,48,123]	https://sites.google.com/view/uspdsrepository (accessed on 29 May 2025)
Arabic Digits	[123]	https://www.timeseriesclassification.com/description.php?Dataset=SpokenArabicDigits (accessed on 29 May 2025)
Localization Data for Posture Reconstruction	[123]	https://www.kaggle.com/datasets/uciml/posture-reconstruction (accessed on 29 May 2025)
Bike Sharing	[123]	https://www.kaggle.com/datasets/lakshmi25npathi/bike-sharing-dataset (accessed on 29 May 2025)
Keystroke	[123]	https://sites.google.com/site/nonstationaryarchive/ (accessed on 29 May 2025)
Airlines	[48,104]	https://www.openml.org/search?type=data&sort=runs&id=1169&status=active (accessed on 29 May 2025)
Tuyek	[48]	https://www.nature.com/articles/s41597-021-00938-3 (accessed on 29 May 2025)
Bank-Marketing	[48]	https://www.openml.org/d/1461 (accessed on 29 May 2025)
Electricity	[48,104]	https://www.openml.org/d/151 (accessed on 29 May 2025)
Adult	[48]	https://archive.ics.uci.edu/ml/datasets/adult (accessed on 29 May 2025)
KDD Cup 1999	[48]	http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html (accessed on 29 May 2025)
CoverType	[48]	https://www.openml.org/d/180 (accessed on 29 May 2025)
EMG Physical Action	[119]	https://www.kaggle.com/datasets/durgancegaur/emg-physical-action-data-set (accessed on 29 May 2025)
SWaT	[122]	https://itrust.sutd.edu.sg/itrust-labs_datasets/dataset_info/ (accessed on 29 May 2025)
BATADAL	[122]	https://itrust.sutd.edu.sg/itrust-labs_datasets/dataset_info/ (accessed on 29 May 2025)
CICIDS2017	[122]	https://www.kaggle.com/datasets/chethuhn/network-intrusion-dataset (accessed on 29 May 2025)
WADI	[122]	https://itrust.sutd.edu.sg/itrust-labs_datasets/dataset_info/ (accessed on 29 May 2025)
Financial Distress	[59]	https://www.kaggle.com/datasets/shebrahimi/financial-distress (accessed on 29 May 2025)
Loan Approval Process	[94]	https://www.kaggle.com/datasets/architsharma01/loan-approval-prediction-dataset (accessed on 29 May 2025)
GM	[37]	https://github.com/LiangYiAnita/mcd-dd/tree/main/data/toy_data (accessed on 29 May 2025)
GamLog	[37]	https://github.com/LiangYiAnita/mcd-dd/tree/main/data/toy_data (accessed on 29 May 2025)
LogGamWei	[37]	https://github.com/LiangYiAnita/mcd-dd/tree/main/data/toy_data (accessed on 29 May 2025)
GamGM	[37]	https://github.com/LiangYiAnita/mcd-dd/tree/main/data/toy_data (accessed on 29 May 2025)
Brazilian E-Commerce	[133]	https://www.kaggle.com/datasets/olistbr/brazilian-ecommerce (accessed on 29 May 2025)
Online Retail II	[133]	https://www.kaggle.com/datasets/olistbr/brazilian-ecommerce (accessed on 29 May 2025)

Table A5. List of public datasets cited by papers based on Bayesian and Uncertainty Modeling.

Dataset	Users	Link
INSECT	[101,123,125]	https://sites.google.com/view/uspdsrepository (accessed on 29 May 2025)
NDBC	[114]	https://www.ndbc.noaa.gov/historicaldata.shtml (accessed on 29 May 2025)
City Learn Challenge 2022	[49]	https://dataverse.tdl.org/dataset.xhtml?persistentId=doi:10.18738/T8/0YLJ6Q (accessed on 29 May 2025)
Arabic Digits	[123]	https://www.timeseriesclassification.com/description.php?Dataset=SpokenArabicDigits (accessed on 29 May 2025)
Localization Fata for Posture Reconstruction	[123]	https://www.kaggle.com/datasets/uciml/posture-reconstruction (accessed on 29 May 2025)
Bike Sharing	[123]	https://www.kaggle.com/datasets/lakshmi25npathi/bike-sharing-dataset (accessed on 29 May 2025)
Keystroke	[123]	https://sites.google.com/site/nonstationaryarchive/ (accessed on 29 May 2025)
UNISOLAR	[32]	https://ieeexplore.ieee.org/document/9869474 (accessed on 29 May 2025)
WWTP	[120]	https://www.kaggle.com/datasets/d4rklucif3r/full-scale-waste-water-treatment-plant-data (accessed on 29 May 2025)
BFIP	[120]	https://ieee-dataport.org/documents/blast-furnace-ironmaking-process-monitoring-time-constrained-global-and-local-nonlinear (accessed on 29 May 2025)

Table A6. List of public datasets cited by papers based on Causal and Temporal Dependency Tests.

Dataset	Users	Link
BPI Challenge 2017	[40,94]	https://data.4tu.nl/articles/dataset/BPI_Challenge_2017/12696884 (accessed on 29 May 2025)
Loan Approval Process	[94]	https://www.kaggle.com/datasets/architsharma01/loan-approval-prediction-dataset (accessed on 29 May 2025)
NDBC	[114]	https://www.ndbc.noaa.gov/historicaldata.shtml (accessed on 29 May 2025)

Table A7. List of public datasets cited by papers based on Rule/Logical Pattern Mining.

Dataset	Users	Link
Airlines	[34,134]	https://www.openml.org/search?type=data&sort=runs&id=1169&status=active (accessed on 29 May 2025)
Nursery	[134]	https://archive.ics.uci.edu/dataset/76/nursery (accessed on 29 May 2025)
BPI Challenge 2017	[30]	https://data.4tu.nl/articles/dataset/BPI_Challenge_2017/12696884 (accessed on 29 May 2025)
BPIC 2012	[30]	https://data.4tu.nl/articles/_/12689204/1 (accessed on 29 May 2025)
Traffic Fine Management	[30]	https://data.4tu.nl/articles/_/12683249/1 (accessed on 29 May 2025)
Hospital Billing	[30]	https://doi.org/10.4121/uuid:76c46b83-c930-4798-a1c9-4be94dfeb741 (accessed on 29 May 2025)
Sepsis Cases	[30]	https://doi.org/10.4121/uuid:915d2bfb-7e84-49ad-a286-dc35f063a460 (accessed on 29 May 2025)
Loan Approval Process	[30]	https://www.kaggle.com/datasets/architsharma01/loan-approval-prediction-dataset (accessed on 29 May 2025)
Microsoft Malware Classification Challenge (BIG 2015)	[113]	https://www.kaggle.com/c/malware-classification/data (accessed on 29 May 2025)
Drebin	[113]	https://drebin.mlsec.org/ (accessed on 29 May 2025)
CIC-InvesAndMal2019	[113]	https://www.kaggle.com/datasets/malikbaqi12/cic-invesandmal2019-dataset (accessed on 29 May 2025)
INSECT	[34]	https://sites.google.com/view/uspdsrepository (accessed on 29 May 2025)
TuEyeQ	[34]	https://www.nature.com/articles/s41597-021-00938-3 (accessed on 29 May 2025)
Bank-Marketing	[34]	https://www.openml.org/d/1461 (accessed on 29 May 2025)
Electricity	[34,99,106]	https://www.openml.org/d/151 (accessed on 29 May 2025)
KDD Cup 1999	[34]	http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html (accessed on 29 May 2025)
CoverType	[34,99]	https://www.openml.org/d/180 (accessed on 29 May 2025)
Gas	[34]	https://www.sciencedirect.com/science/article/pii/S0925400512002018 (accessed on 29 May 2025)
Shuttle	[99]	https://archive.ics.uci.edu/dataset/148/statlog+shuttle (accessed on 29 May 2025)
DowJones	[99]	https://www.kaggle.com/datasets/mnassrib/dow-jones-industrial-average/data (accessed on 29 May 2025)
IntelLabSensor	[99]	https://www.kaggle.com/datasets/divyansh22/intel-berkeley-research-lab-sensor-data (accessed on 29 May 2025)
Weather	[106]	https://www.kaggle.com/datasets/muthuj7/weather-dataset (accessed on 29 May 2025)
METAR	[106]	https://metplus.readthedocs.io/en/latest/Verification_Datasets/datasets/metar_isu.html (accessed on 29 May 2025)
SUSY	[106]	https://archive.ics.uci.edu/dataset/279/susy (accessed on 29 May 2025)

Table A8. List of public datasets cited by papers based on Feature Attribution and Game Theory.

Dataset	Users	Link
Wikipedia Vandalism	[128]	https://pan.webis.de/clef11/pan11-web/wikipedia-vandalism-detection.html (accessed on 29 May 2025)
Financial Distress	[59]	https://www.kaggle.com/datasets/shebrahimi/financial-distress (accessed on 29 May 2025)
National Deep Sea Center	[111]	https://maritimeindia.org/chinas-deep-sea-research-capabilities-part-i-national-deep-sea-center-qingdao/ (accessed on 29 May 2025)
INSECT	[48,108]	https://sites.google.com/view/uspdsrepository (accessed on 29 May 2025)
Bike Sharing	[50,108]	https://www.kaggle.com/datasets/lakshmi25npathi/bike-sharing-dataset (accessed on 29 May 2025)
Bank-Marketing	[48,108]	https://www.openml.org/d/1461 (accessed on 29 May 2025)
Electricity	[48,50,104,108]	https://www.openml.org/d/151 (accessed on 29 May 2025)
Adult	[48,50,108]	https://archive.ics.uci.edu/ml/datasets/adult (accessed on 29 May 2025)
CICIDS2017	[31,61]	https://www.kaggle.com/datasets/chethuhn/network-intrusion-dataset (accessed on 29 May 2025)
Wikipedia Vandalism 2	[107]	https://drive.google.com/drive/folders/1uJbISGk-NFDDaL9uxACuLgfMohyi4-ai (accessed on 29 May 2025)
California	[50]	https://www.dcc.fc.up.pt/~ltorgo/Regression/cal_housing.html (accessed on 29 May 2025)
NLS-KDD	[61]	https://www.kaggle.com/datasets/hassan06/nslkdd (accessed on 29 May 2025)
IoTID20	[61]	https://www.kaggle.com/datasets/rohulaminlabid/iotid20-dataset (accessed on 29 May 2025)
CASAS	[61]	https://www.kaggle.com/datasets/aakshigarg/cases-dataset (accessed on 29 May 2025)
CSIC2010	[60]	https://www.kaggle.com/datasets/ispangler/csic-2010-web-application-attacks (accessed on 29 May 2025)
UNSW-NB15	[60]	https://research.unsw.edu.au/projects/unsw-nb15-dataset (accessed on 29 May 2025)
MALICIOUSURL	[60]	https://github.com/faizann24/Using-machine-learning-to-detect-malicious-URLs (accessed on 29 May 2025)
ISCXURL2016	[60]	https://www.unb.ca/cic/datasets/url-2016.html (accessed on 29 May 2025)
KDD CUP 1999	[48,117]	http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html (accessed on 29 May 2025)
PROMISE12 Prostate MRI	[100]	https://promise12.grand-challenge.org/ (accessed on 29 May 2025)
Brazilian E-Commerce	[133]	https://www.kaggle.com/datasets/olistbr/brazilian-ecommerce (accessed on 29 May 2025)
Online Retail II	[133]	https://archive.ics.uci.edu/dataset/502/online+retail+ii (accessed on 29 May 2025)
Southampton General Hospital’s ED	[105]	https://data.southampton.ac.uk/point-of-service/southampton-general-hospital.html (accessed on 29 May 2025)
NDBC	[114]	https://www.ndbc.noaa.gov/historicaldata.shtml (accessed on 29 May 2025)
Airlines	[48,104]	https://www.openml.org/search?type=data&sort=runs&id=1169&status=active (accessed on 29 May 2025)
TuEyeQ	[48]	https://www.nature.com/articles/s41597-021-00938-3 (accessed on 29 May 2025)
CoverType	[48,130]	https://www.openml.org/d/180 (accessed on 29 May 2025)
Customer Complaints	[141]	https://www.consumerfinance.gov/data-research/consumer-complaints/ (accessed on 29 May 2025)
SMD	[130]	https://www.kaggle.com/datasets/mgusat/smd-onmiad (accessed on 29 May 2025)
NYISO	[98]	https://www.kaggle.com/datasets/m4rz910/nyisotoolkit (accessed on 29 May 2025)

Table A9. List of public datasets cited by papers based on Latent-Representation Geometry and Similarity.

Dataset	Users	Link
PAMAP	[39]	https://www.researchgate.net/publication/235348485_Introducing_a_New_Benchmarked_Dataset_for_Activity_Monitoring (accessed on 29 May 2025)
CIFAR-10	[47,96]	https://www.cs.toronto.edu/~kriz/cifar.html (accessed on 29 May 2025)
CIFAR-100	[96]	https://www.cs.toronto.edu/~kriz/cifar.html (accessed on 29 May 2025)
FashionMINST	[47,96]	https://www.kaggle.com/datasets/zalando-research/fashionmnist (accessed on 29 May 2025)
CelebA	[47]	https://www.kaggle.com/datasets/jessicali9530/celeba-dataset (accessed on 29 May 2025)
Drebin	[65]	https://drebin.mlsec.org/ (accessed on 29 May 2025)
Electricity	[124]	https://www.openml.org/d/151 (accessed on 29 May 2025)
Weather Forecasting	[124]	https://www.kaggle.com/datasets/muthuj7/weather-dataset (accessed on 29 May 2025)
Kitti	[124]	https://paperswithcode.com/dataset/kitti (accessed on 29 May 2025)
CNNIBN	[124]	https://www.kaggle.com/datasets/hadasu92/cnn-articles (accessed on 29 May 2025)
BBC	[124]	https://www.kaggle.com/c/learn-ai-bbc (accessed on 29 May 2025)
NEU Surface Defect	[139]	https://www.kaggle.com/datasets/kaustubhdikshit/neu-surface-defect-database (accessed on 29 May 2025)
XSteel Surface Defect	[139]	https://ieee-dataport.org/documents/x-sdd (accessed on 29 May 2025)
UNICON	[116]	https://ieeexplore.ieee.org/abstract/document/9869498 (accessed on 29 May 2025)
UNISOLAR	[116]	https://ieeexplore.ieee.org/document/9869474 (accessed on 29 May 2025)
SVHN	[96]	http://ufldl.stanford.edu/housenumbers/ (accessed on 27 May 2025)
MNIST	[96]	https://www.kaggle.com/datasets/hojjatk/mnist-dataset (accessed on 29 May 2025)
EMNIST	[96]	https://www.nist.gov/itl/products-and-services/emnist-dataset (accessed on 29 May 2025)
KMNIST	[96]	https://github.com/rois-codh/kmnist (accessed on 29 May 2025)
Free Spoken Digits	[96]	https://github.com/Jakobovski/free-spoken-digit-dataset (accessed on 29 May 2025)
Google Speech Commands	[96]	https://www.tensorflow.org/datasets/catalog/speech_commands (accessed on 29 May 2025)
Temple University Hospital Seizure Corpus	[96]	https://isip.piconepress.com/projects/tuh_eeg/ (accessed on 29 May 2025)
INSECT	[37]	https://sites.google.com/view/uspdsrepository (accessed on 29 May 2025)
GM	[37]	https://github.com/LiangYiAnita/mcd-dd/tree/main/data/toy_data (accessed on 29 May 2025)
GamLog	[37]	https://github.com/LiangYiAnita/mcd-dd/tree/main/data/toy_data (accessed on 29 May 2025)
LogGamWei	[37]	https://github.com/LiangYiAnita/mcd-dd/tree/main/data/toy_data (accessed on 29 May 2025)
GamGM	[37]	https://github.com/LiangYiAnita/mcd-dd/tree/main/data/toy_data (accessed on 29 May 2025)

Table A10. List of public datasets cited by papers based on Fuzzy Transparency.

Dataset	Users	Link
Occupacy Detection	[126]	https://archive.ics.uci.edu/dataset/357/occupancy+detection (accessed on 29 May 2025)
Optioncal Recognition of Handwritten Digits	[126]	https://archive.ics.uci.edu/dataset/80/optical+recognition+of+handwritten+digits (accessed on 29 May 2025)
Epilepsy EEG Signal	[136]	https://github.com/benfulcher/hctsaTutorial_BonnEEG (accessed on 29 May 2025)
Singapore Highway Traffic	[102]	https://www.kaggle.com/datasets/rahat52/traffic-density-singapore (accessed on 29 May 2025)
WWTP	[120]	https://www.kaggle.com/datasets/d4rklucif3r/full-scale-waste-water-treatment-plant-data (accessed on 29 May 2025)
BFIP	[120]	https://ieee-dataport.org/documents/blast-furnace-ironmaking-process-monitoring-time-constrained-global-and-local-nonlinear (accessed on 29 May 2025)
CICIDS2017	[129]	https://www.kaggle.com/datasets/chethuhn/network-intrusion-dataset (accessed on 29 May 2025)
Electricity	[106,124]	https://www.openml.org/d/151 (accessed on 29 May 2025)
Weather Forecasting	[106,124]	https://www.kaggle.com/datasets/muthuj7/weather-dataset (accessed on 29 May 2025)
Kitti	[124]	https://paperswithcode.com/dataset/kitti (accessed on 29 May 2025)
CNNIBN	[124]	https://www.kaggle.com/datasets/hadasu92/cnn-articles (accessed on 29 May 2025)
BBC	[124]	https://www.kaggle.com/c/learn-ai-bbc (accessed on 29 May 2025)
METAR	[106]	https://metplus.readthedocs.io/en/latest/Verification_Datasets/datasets/metar_isu.html (accessed on 29 May 2025)
SUSY	[106]	https://archive.ics.uci.edu/dataset/279/susy (accessed on 29 May 2025)

Table A11. List of public datasets cited by papers based on Prototype/Medoids and Exemplar Tracking.

Dataset	Users	Link
Caltech-UCSD Birds-200-201	[35]	https://authors.library.caltech.edu/records/cvm3y-5hh21 (accessed on 29 May 2025)

Table A12. List of public datasets cited by papers based on Graph Attention and Structural Reasoning.

Dataset	Users	Link
VIRAT	[131]	https://viratdata.org/ (accessed on 29 May 2025)
Loan Approval Process	[46]	https://www.kaggle.com/datasets/architsharma01/loan-approval-prediction-dataset (accessed on 29 May 2025)
German Credit	[46]	https://archive.ics.uci.edu/dataset/573/south+german+credit+update (accessed on 29 May 2025)

Table A13. List of public datasets cited by papers based on Optimization and Resource Scheduling.

Dataset	Users	Link
CIFAR-10	[32]	https://www.cs.toronto.edu/~kriz/cifar.html (accessed on 29 May 2025)

Table A14. List of public datasets cited by papers based on Performance Baseline and Metric Audit.

Dataset	Users	Link
BPIC2012	[115]	https://data.4tu.nl/articles/_/12689204/1 (accessed on 29 May 2025)
BPIC2015	[115]	https://data.4tu.nl/collections/BPI_Challenge_2015/5065424/1 (accessed on 29 May 2025)
BPIC2020	[115]	https://data.4tu.nl/collections/BPI_Challenge_2020/5065541/1 (accessed on 29 May 2025)
Credit Requirement Event Logs	[115]	https://data.4tu.nl/articles/dataset/Credit_Requirement_Event_Logs/12693005/1 (accessed on 29 May 2025)
Hospital Billing	[115]	https://doi.org/10.4121/uuid:76c46b83-c930-4798-a1c9-4be94dfeb741 (accessed on 29 May 2025)
Sepsis Cases	[115]	https://doi.org/10.4121/uuid:915d2bfb-7e84-49ad-a286-dc35f063a460 (accessed on 29 May 2025)
Helpdesk Event Log3	[115]	https://data.4tu.nl/articles/_/12675977/1 (accessed on 29 May 2025)
Pneumonia	[44]	https://www.kaggle.com/datasets/paultimothymooney/chest-xray-pneumonia (accessed on 29 May 2025)
MIMIC-II	[44]	https://archive.physionet.org/mimic2/ (accessed on 29 May 2025)
COMPASS	[44]	https://mlr3fairness.mlr-org.com/reference/compas.html (accessed on 29 May 2025)
Housing Price	[44]	https://www.kaggle.com/datasets/yasserh/housing-prices-dataset (accessed on 29 May 2025)

Table A15. List of public datasets cited by papers based on Bias/Fairness Through Time.

Dataset	Users	Link
German Credit	[145]	https://archive.ics.uci.edu/dataset/573/south+german+credit+update (accessed on 29 May 2025)
COMPASS	[145]	https://mlr3fairness.mlr-org.com/reference/compas.html (accessed on 29 May 2025)

Table A16. List of public datasets cited by papers based on Human-in-the-Loop and Pedagogical XAI.

Dataset	Users	Link
Electricity	[143]	https://www.openml.org/d/151 (accessed on 29 May 2025)
California	[143]	https://www.dcc.fc.up.pt/~ltorgo/Regression/cal_housing.html (accessed on 29 May 2025)
CICMalDroid2020	[45]	https://www.kaggle.com/datasets/hasanccr92/cicmaldroid-2020 (accessed on 29 May 2025)
CICMalMem2022	[45]	https://www.kaggle.com/datasets/luccagodoy/obfuscated-malware-memory-2022-cic (accessed on 29 May 2025)
Customer Complaints	[141]	https://www.consumerfinance.gov/data-research/consumer-complaints/ (accessed on 29 May 2025)

Table A17. List of public datasets cited by papers based on Datasets/Provenance and Compliance Tags.

Dataset	Users	Link
CPEE	[43]	https://cpee.org/ (accessed on 29 May 2025)
Helpdesk Event Log3	[43]	https://data.4tu.nl/articles/_/12675977/1 (accessed on 29 May 2025)
SML2010	[109]	https://archive.ics.uci.edu/dataset/274/sml2010 (accessed on 29 May 2025)
Hungarian chicknpox cases	[109]	https://archive.ics.uci.edu/dataset/580/hungarian+chickenpox+cases (accessed on 29 May 2025)
Global land temperature	[109]	https://zenodo.org/records/3634713 (accessed on 29 May 2025)
Sales Prediction	[109]	https://www.kaggle.com/datasets/podsyp/time-series-starter-dataset (accessed on 29 May 2025)
Air quality	[109]	https://archive.ics.uci.edu/dataset/360/air+quality (accessed on 29 May 2025)
Ozone level detection	[109]	https://archive.ics.uci.edu/dataset/172/ozone+level+detection (accessed on 29 May 2025)
Dublin footfall counts 2022	[109]	https://data.gov.ie/dataset/dublin-city-centre-footfall-counters (accessed on 29 May 2025)

Appendix B.2

Table A18. Classification of Distance/Divergence Geometry literature based on key dimensions.

Dimension	Value	Papers
Drift Type	Data Drift	[96,142]
	Sudden Concept Drift	[37,62], [47] , [117]
	Gradual Concept Drift	[37,63], [127] , [47] , [117] *
	Incremental Concept Drift	[37]
	Recurrent Concept Drift	[37]
	Concept Drift (General)	[121,122,130,140], [128] , [60]
Application Context	Manufacturing	[62,142]
	IoT	[122]
	Healthcare	[96], [47] *
	Hydraulic	[63]
	Energy	[121,140]
	Ecology	[140]
	Environmental Sensors	[37]
	Server Monitoring	[130]
	Online Community	[128] *
	Cybersecurity	[127] , [60] , [117] *
Scope	Local	[62,63,96,122], [128] , [127] , [47] , [60] , [117] *
	Global	[37,130,140], [60] *
Audience	Data Scientists	[63,96,122,130,140,142], [127] , [47] , [60] , [117] , [37,121]
	End-Users	[63,140], [128] , [47]
Metrics	User Study	[140]
Learning Mode	Offline	[121,142], [60] *
	Online	[37,62,140], [47] *
	Hybrid	[63,96,122,130], [128] , [127] , [117] *

* Overlapping papers.

Table A19. Classification of Change-Point and Statistical-Test Theory literature based on key dimensions.

Dimension	Value	Papers
Drift Type	Concept Drift (general)	[40,48,104,116,119,123], [122] , [94] , [133] *
	Sudden Concept Drift	[37] *
	Incremental Concept Drift	[37] *
	Recurrent Concept Drift	[37] *
	Data Drift	[59] *
Application Context	Process Mining	[40], [94] *, [104,123]
	Energy	[116], [59] *
	Healthcare	[48,119]
	IoT	[122] *
	Environmental Sensors	[37] *
	Financial	[48]
	Retail	[48], [133] *
Scope	Global	[40,48,116], [94] , [37]
	Local	[48,104,116,119,123], [122] , [59] , [133] *
Audience	Data Scientists	[40,48,104,116,119,123], [122] , [59] , [94] , [37] , [133] *
Metrics	–	–
Learning Mode	Offline	[40], [94] *
	Hybrid	[116], [122] , [59]
	Online	[48,104,119,123], [37] , [133]

* Overlapping papers.

Table A20. Classification of Bayesian and Uncertainty Modeling literature based on key dimensions.

Dimension	Value	Papers
Drift Type	Sudden Concept Drift	[101,114]
	Incremental Concept Drift	[101]
	Gradual Concept Drift	[101,114], [120] *
	Recurrent Concept Drift	[101]
	Concept Drift (General)	[125], [123] *
	Data drift	[49,58], [32] *
Application Context	IoT	[101]
	Marine Science	[114]
	Energy	[49]
	Finance	[58]
	Anomaly Detection	[125]
	Process Mining	[123] *
	Video Surveillance	[32] *
	Industrial Processes	[120] *
Scope	Global	[49,101,114], [32] , [120]
	Local	[58,125], [123] *
Audience	Data Scientists	[49,58,101,114,125], [123] , [32] , [120] *
Metrics	—	—
Learning Mode	Online	[101,125], [123] , [32] , [120] *
	Hybrid	[49,114]
	Offline	[58]

* Overlapping papers.

Table A21. Classification of Causal and Temporal Dependency literature based on key dimensions.

Dimension	Value	Papers
Drift Type	Concept Drift (General)	[94], [40] *
	Sudden Concept Drift	[114] *
	Gradual Concept Drift	[114] *
Application Context	Process Mining	[94], [40] *
	Marine Science	[114] *
Scope	Global	[94], [40] , [114]
Audience	Data Scientists	[94], [40] , [114]
Metrics	—	—
Learning Mode	Offline	[94], [40] *
	Hybrid	[114] *

* Overlapping papers.

Table A22. Classification of Rule/Logical Pattern literature based on key dimensions.

Dimension	Value	Papers
Drift type	Concept Drift (General)	[34,95,99,106,110,132,134]
	Incremental Concept Drift	[113]
	Data Drift	[30]
Application Context	Data Mining	[34,95,99,110,134]
	Process Mining	[30]
	Cybersecurity	[113]
	Finance	[132]
	Aviation	[106]
Scope	Global	[34,99,132,134]
	Local	[30,95,106,110,113]
Audience	Data Scientists	[30,34,95,99,106,110,113,132,134]
Metrics	Faithfulness and Monotonicity	[106]
Learning Mode	Online	[34,95,99,106,110,134]
	Offline	[113]
	Hybrid	[30]

Table A23. Classification of Feature Attribution and Game Theory literature based on key dimensions.

Dimension	Value	Papers
Drift Type	Concept Drift (General)	[31,33,36,50,60,61,98,100,107,108,111,118,128,133,144], [48] , [38] , [141] , [130] , [104] *
	Sudden Concept Drift	[117], [114] *
	Gradual Concept Drift	[117], [114] *, [127]
	Data Drift	[59,105]
Application Context	Online Community	[107,128]
	Cybersecurity	[31,33,60,117,127]
	Generic Data Stream	[50,100,108,111,144], [114] *
	Marine Science	[114] *
	Process Mining	[104] *
	Energy	[59,98]
	Education	[118], [48] *
	Human-Activity Recognition	[36]
	IoT	[61], [38] *
	Time-Series Forecasting	[144]
	Retail	[133], [48] *
	Healthcare	[48] *, [105]
	Financial	[48] , [141]
	Server Monitoring	[130] *
Scope	Local	[33,36,59,60,61,98,107,111,117,127,128,133], [48] , [38] , [104] , [141]
	Global	[31,50,60,98,100,108,111,118,133,144], [114] , [38] , [48] , [130]
Audience	End-Users	[107,118,128]
	Data Scientists	[31,33,36,50,59,60,61,98,100,105,108,111,117,127,133,144], [114] , [48] , [38] , [141] , [130] , [104]
Metrics	Feature Agreement	[118]
	Fidelity–Sparsity	[38] *
Learning Mode	Online	[50,61,98,108,111,117,133,144], [114] , [141] , [104] *
	Hybrid	[36,59,107,127,128], [38] , [130]
	Offline	[33,60,100,105,118], [48] *

* Overlapping papers.

Table A24. Classification of Latent-Representation Geometry and Similarity literature based on key dimensions.

Dimension	Value	Papers
Type of Drift	Gradual Concept Drift	[39]
	Sudden Concept Drift	[37] *
	Recurrent Concept Drift	[37] *
	Incremental Concept Drift	[37] *
	Data drift	[39,135,139], [96] , [97]
	Concept Drift (General)	[47,53,65,124], [116] *
Application Context	Financial	[53]
	Geology	[135]
	Image Generation	[47]
	Process mining	[65]
	Healthcare	[124], [97] *
	Weather prediction	[124]
	Manufacturing	[139], [96] *
	Energy	[116] *
	Environmental Sensors	[37] *
Scope	Global	[39,47,65,124,135,139], [116] , [96] , [37] , [97]
	Local	[39,47,53,139], [116] , [96] , [97] *
Audience	Data Scientists	[39,47,53,65,124,135,139], [116] , [96] , [37] *
	End-Users	[97] *
Metrics	—	—
Learning Mode	Offline	[39,47,65,135]
	Hybrid	[53], [116] , [96] , [97] *
	Online	[124,139], [37] *

* Overlapping papers.

Table A25. Classification of Fuzzy Transparency literature based on key dimensions.

Dimension	Value	Papers
Type of Drift	Sudden Concept Drift	[126]
	Gradual Concept Drift	[120,126]
	Concept Drift (general)	[102,129], [124] , [106]
	Data Drift	[136]
Application Context	Smart Building	[126]
	Healthcare	[136], [124] *
	Weather Prediction	[124] *
	Highway traffic prediction	[102]
	Industrial Process	[120]
	Cybersecurity	[129]
	Aviation	[106] *
Scope	Global	[102,120,126,129,136], [124] , [106]
Audience	End-Users	[126]
	Data Scientists	[102,120,126,129,136], [124] , [106]
Metrics	Faithfulness and Monotonicity	[106] *
Learning Mode	Online	[102,120,126,129], [124] , [106]
	Offline	[136]

* Overlapping papers.

Table A26. Classification of Prototype/Medoids and Exemplar Tracking literature based on key dimensions.

Dimension	Value	Papers
Type of Drift	Gradual Concept Drift	[41,64,146]
	Incremental Concept Drift	[35,51,64]
	Recurrent Concept Drift	[51]
Application Context	Financial	[146]
	Image Classification	[35]
	IoT	[41]
	Network Traffic	[51]
	Healthcare	[64]
Scope	Global	[35,41,51,64,146]
Audience	End-Users	[146], [41]
	Data Scientists	[35,41,51,64]
Metrics	IoU	[35]
Learning Mode	Hybrid	[64,146]
	Offline	[35]
	Online	[41,51]

Table A27. Classification of Graph Attention and Structural Reasoning literature based on key dimensions.

Dimension	Value	Papers
Type of Drift	Concept Drift (General)	[38,131]
	Data Drift	[46,97]
Application Context	IoT	[38]
	Video Surveillance	[131]
	General	[46]
	Healthcare	[97]
Scope	Global	[38,46,97,131]
	Local	[38,46,97,131]
Audience	Data Scientists	[38,46,131]
	End-Users	[46,97]
Metrics	Fidelity–Sparsity	[38]
	Robust Validity	[46]
Learning Mode	Hybrid	[38,97]
	Online	[131]
	Offline	[46]

Table A28. Classification of Optimization and Resource Scheduling literature based on key dimensions.

Dimension	Value	Papers
Type of Drift	Data Drift	[32]
Application Context	Video Surveillance	[32]
Scope	Global	[32]
Audience	Data Scientists	[32]
Metrics	—	—
Learning Mode	Online	[32]

Table A29. Classification of Performance Baseline and Metric Audit literature based on key dimensions.

Dimension	Value	Papers
Type of Drift	Concept Drift (General)	[29,57,115]
	Data Drift	[44]
Application Context	Business Process Management	[115]
	Healthcare	[44,57]
	Software Engineering	[29]
Scope	Global	[29,44,57,115]
Audience	Data Scientists	[29,44,57,115]
	End-Users	[29,57]
Metrics	-	-
Learning Mode	Offline	[29,44,57,115]

Table A30. Classification of Bias/Fairness Through Time literature based on key dimensions.

Dimension	Value	Papers
Type of Drift	Concept Drift (General)	[145]
	Data Drift	[42,145]
Application Context	Financial	[42]
Scope	Global	[42,145]
	Local	[42,145]
Audience	End-Users	[145]
	Data Scientists	[42,145]
Metrics	—	—
Learning Mode	Hybrid	[42,145]

Table A31. Classification of Human-in-the-Loop and Pedagogical XAI literature based on key dimensions.

Dimension	Value	Papers
Type of Drift	Gradual Concept Drift	[143]
	Sudden Concept Drift	[143]
	Concept Drift (General)	[45,141]
Application Context	Financial	[141,143]
	Energy	[143]
	Cybersecurity	[45]
Scope	Global	[45,143]
	Local	[141]
Audience	Data Scientists	[45,141,143]
Metrics	-	-
Learning Mode	Online	[141,143]
	Hybrid	[45]

Table A32. Classification of Dataset/Provenance and Compliance Tags literature based on key dimensions.

Dimension	Value	Papers
Type of Drift	Concept Drift (General)	[43]
	Data Drift	[109]
Application Context	Manufacturing	[43]
	Logistic	[43]
	Healthcare	[43]
	General	[109]
Scope	Global	[43,109]
Audience	Data Scientists	[43,109]
	End-Users	[43]
Metrics	-	-
Learning Mode	Hybrid	[43]
	Offline	[109]

References

Xie, L.; Zhang, H.; Yang, H.; Hu, Z.; Cheng, X.; Zhang, L. A Review of Phishing Detection Research. Dianzi Keji Daxue Xuebao/J. Univ. Electron. Sci. Technol. China 2024, 53, 883–899. [Google Scholar] [CrossRef]
Kohjitani, H.; Koshimizu, H.; Nakamura, K.; Okuno, Y. Recent Developments in Machine Learning Modeling Methods for Hypertension Treatment. Hypertens. Res. 2024, 47, 700–707. [Google Scholar] [CrossRef] [PubMed]
Al Hajji, Y.; Al Hajji, F.Z.; Lee, L. Navigating the Horizon of Opportunity: A Comprehensive Review of Artificial Intelligence Applications in Cancer Care—Insights from the 2024 Landscape, a Narrative Review. J. Med. Artif. Intell. 2024, 7. [Google Scholar] [CrossRef]
Riaz, I.; Khan, M.; Haddad, T. Potential Application of Artificial Intelligence in Cancer Therapy. Curr. Opin. Oncol. 2024, 36, 437–448. [Google Scholar] [CrossRef]
Loftus, T.; Tighe, P.; Ozrazgat-Baslanti, T.; Davis, J.; Ruppert, M.; Ren, Y.; Shickel, B.; Kamaleswaran, R.; Hogan, W.; Moorman, J.; et al. Ideal Algorithms in Healthcare: Explainable, Dynamic, Precise, Autonomous, Fair, and Reproducible. PLoS Digit. Health 2022, 1, e0000006. [Google Scholar] [CrossRef] [PubMed]
Cai, Y.; Sun, T. Investigating of Deep Learning-based Approaches for Anomaly Detection in IoT Surveillance Systems Jianchang HUANG*. Int. J. Adv. Comput. Sci. Appl. 2023, 14, 768–778. [Google Scholar] [CrossRef]
Hu, H.; Kantardzic, M.; Sethi, T. No Free Lunch Theorem for Concept Drift Detection in Streaming Data Classification: A Review. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2020, 10, e1327. [Google Scholar] [CrossRef]
Sharief, F.; Ijaz, H.; Shojafar, M.; Naeem, M.A. Multi-Class Imbalanced Data Handling with Concept Drift in Fog Computing: A Taxonomy, Review, and Future Directions. ACM Comput. Surv. 2024, 57, 16. [Google Scholar] [CrossRef]
Zliobaitė, I. Learning under Concept Drift: An Overview. arXiv 2010, arXiv:1010.4784. [Google Scholar] [CrossRef]
Gama, J. Knowledge Discovery from Data Streams; Data Mining and Knowledge Discovery; Chapman and Hall/CRC: New York, NY, USA, 2010. [Google Scholar] [CrossRef]
Gama, J.; Žliobaitundefined, I.; Bifet, A.; Pechenizkiy, M.; Bouchachia, A. A survey on concept drift adaptation. ACM Comput. Surv. 2014, 46, 44. [Google Scholar] [CrossRef]
Webb, G.I.; Hyde, R.; Cao, H.; Nguyen, H.L.; Petitjean, F. Characterizing concept drift. Data Min. Knowl. Discov. 2016, 30, 964–994. [Google Scholar] [CrossRef]
Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; KDD ’16, pp. 1135–1144. [Google Scholar] [CrossRef]
Miller, T. Explanation in artificial intelligence: Insights from the social sciences. Artif. Intell. 2019, 267, 1–38. [Google Scholar] [CrossRef]
Lipton, Z.C. The mythos of model interpretability. Commun. ACM 2018, 61, 36–43. [Google Scholar] [CrossRef]
Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 2019, 1, 206–215. [Google Scholar] [CrossRef] [PubMed]
Doshi-Velez, F.; Kim, B. Towards A Rigorous Science of Interpretable Machine Learning. arXiv 2017, arXiv:1702.08608. [Google Scholar] [CrossRef]
Yang, W.; Wei, Y.; Wei, H.; Chen, Y.; Huang, G.; Li, X.; Li, R.; Yao, N.; Wang, X.; Gu, X.; et al. Survey on Explainable AI: From Approaches, Limitations and Applications Aspects. Hum.-Centric Intell. Syst. 2023, 3, 161–188. [Google Scholar] [CrossRef]
Linardatos, P.; Papastefanopoulos, V.; Kotsiantis, S. Explainable AI: A Review of Machine Learning Interpretability Methods. Entropy 2020, 23, 18. [Google Scholar] [CrossRef]
Mahmood, S.; Teo, C.; Sim, J.; Zhang, W.; Muyun, J.; Bhuvana, R.; Teo, K.; Yeo, T.T.; Lu, J.; Gulyas, B.; et al. The application of eXplainable artificial intelligence in studying cognition: A scoping review. Ibrain 2024, 10, 245–265. [Google Scholar] [CrossRef]
Molnar, C. Interpretable Machine Learning: A Guide for Making Black Box Models Explainable. 2025. Available online: https://christophm.github.io/interpretable-ml-book (accessed on 29 May 2025).
Baniecki, H.; Kretowicz, W.; Piatyszek, P.; Wisniewski, J.; Biecek, P. Dalex: Responsible Machine Learning with Interactive Explainability and Fairness in Python. J. Mach. Learn. Res. 2021, 22, 1–7. [Google Scholar]
de Lemos, R.; Grześ, M. Self-Adaptive Artificial Intelligence. In Proceedings of the 14th International Symposium on Software Engineering for Adaptive and Self-Managing Systems, Montreal, QC, Canada, 25 May 2019; SEAMS ’19, pp. 155–156. [Google Scholar] [CrossRef]
Abid, Y.; Wu, J.; Farhan, M.; Ahmad, T. ECMT Framework for Internet of Things: An Integrative Approach Employing In-Memory Attribute Examination and Sophisticated Neural Network Architectures in Conjunction With Hybridized Machine Learning Methodologies. IEEE Internet Things J. 2024, 11, 5867–5886. [Google Scholar] [CrossRef]
Casado, F.; Lema, D.; Iglesias, R.; Regueiro, C.; Barro, S. Ensemble and Continual Federated Learning for Classification Tasks. Mach. Learn. 2023, 112, 3413–3453. [Google Scholar] [CrossRef]
Hosseinalipour, S.; Wang, S.; Michelusi, N.; Aggarwal, V.; Brinton, C.G.; Love, D.J.; Chiang, M. Parallel Successive Learning for Dynamic Distributed Model Training Over Heterogeneous Wireless Networks. IEEE/ACM Trans. Netw. 2023, 32, 222–237. [Google Scholar] [CrossRef]
Lai, T.D. Towards the Generation of Machine Learning Defect Reports. In Proceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering, Melbourne, Australia, 15–19 November 2022; ASE ’21, pp. 1038–1042. [Google Scholar] [CrossRef]
Kleinert, F.; Leufen, L.; Lupascu, A.; Butler, T.; Schultz, M. Representing Chemical History in Ozone Time-Series Predictions - a Model Experiment Study Building on the MLAir (v1.5) Deep Learning Framework. Geosci. Model Dev. 2022, 15, 8913–8930. [Google Scholar] [CrossRef]
Ouatiti, Y.; Sayagh, M.; Kerzazi, N.; Adams, B.; Hassan, A. The Impact of Concept Drift and Data Leakage on Log Level Prediction Models. Empir. Softw. Eng. 2024, 29, 123. [Google Scholar] [CrossRef]
Agarwal, P.; Gao, B.; Huo, S.; Reddy, P.; Dechu, S.; Obeidi, Y.; Muthusamy, V.; Isahagian, V.; Carbajales, S. A Process-Aware Decision Support System for Business Processes. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 14–18 August 2022; KDD ’22, pp. 2673–2681. [Google Scholar] [CrossRef]
Andresini, G.; Pendlebury, F.; Pierazzi, F.; Loglisci, C.; Appice, A.; Cavallaro, L. INSOMNIA: Towards Concept-Drift Robustness in Network Intrusion Detection. In Proceedings of the 14th ACM Workshop on Artificial Intelligence and Security, Virtual Event, Republic of Korea, 15 November 2021; Association for Computing Machinery, Inc.: New York, NY, USA, 2021; pp. 111–122. [Google Scholar] [CrossRef]
Cai, H.; Zhou, Z.; Huang, Q. Online Resource Allocation for Edge Intelligence with Colocated Model Retraining and Inference. In Proceedings of the IEEE INFOCOM 2024—IEEE Conference on Computer Communications, Vancouver, BC, Canada, 20–23 May 2024; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2024; pp. 1900–1909. [Google Scholar] [CrossRef]
Chow, T.; Kan, Z.; Linhardt, L.; Cavallaro, L.; Arp, D.; Pierazzi, F. Drift Forensics of Malware Classifiers. In Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security, Copenhagen, Denmark, 30 November 2023; AISec ’23, pp. 197–207. [Google Scholar] [CrossRef]
Haug, J.; Broelemann, K.; Kasneci, G. Dynamic Model Tree for Interpretable Data Stream Learning. In Proceedings of the 2022 IEEE 38th International Conference on Data Engineering (ICDE), Kuala Lumpur, Malaysia, 9–12 May 2022; IEEE Computer Society: Washington, DC, USA, 2022; pp. 2562–2574. [Google Scholar] [CrossRef]
Rymarczyk, D.; Van De Weijer, J.; Zieliński, B.; Twardowski, B. ICICLE: Interpretable Class Incremental Continual Learning. In Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 1–6 October 2023; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2023; pp. 1887–1898. [Google Scholar] [CrossRef]
Siirtola, P.; Röning, J. Feature Relevance Analysis to Explain Concept Drift—A Case Study in Human Activity Recognition. In Proceedings of the Adjunct Proceedings of the 2022 ACM International Joint Conference on Pervasive and Ubiquitous Computing and the 2022 ACM International Symposium on Wearable Computers, Cambridge, UK, 11–15 September 2022; UbiComp/ISWC'22 Adjunct, pp. 386–391. [Google Scholar] [CrossRef]
Wan, K.; Liang, Y.; Yoon, S. Online Drift Detection with Maximum Concept Discrepancy. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Barcelona, Spain, 25–29 August 2024; Association for Computing Machinery: New York, NY, USA, 2024; pp. 2924–2935. [Google Scholar] [CrossRef]
Wang, G.; Guo, H.; Li, A.; Liu, X.; Yan, Q. Federated IoT Interaction Vulnerability Analysis. In Proceedings of the 2023 IEEE 39th International Conference on Data Engineering (ICDE), Anaheim, CA, USA, 3–7 April 2023; IEEE Computer Society: Washington, DC, USA, 2023; pp. 1517–1530. [Google Scholar] [CrossRef]
Zonoozi, A.; Ho, Q.; Krishnaswamy, S.; Cong, G. ConTrack: A Scalable Method for Tracking Multiple Concepts in Large Scale Multidimensional Data. In Proceedings of the 2016 IEEE 16th International Conference on Data Mining (ICDM), Barcelona, Spain, 12–15 December 2016; Bonchi, F., Domingo-Ferrer, J., Baeza-Yates, R., Zhou, Z.-H., Wu, X., Eds.; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2016; pp. 759–768. [Google Scholar]
Adams, J.; van Zelst, S.; Quack, L.; Hausmann, K.; van der Aalst, W.; Rose, T. A Framework for Explainable Concept Drift Detection in Process Mining. In Proceedings of the 19th International Conference, BPM 2021, Rome, Italy, 6–10 September 2021; Polyvyanyy, A., Wynn, M.T., Van Looy, A., Reichert, M., Eds.; Springer Science and Business Media Deutschland GmbH: Berlin/Heidelberg, Germany, 2021; Volume 12875 LNCS, pp. 400–416. [Google Scholar] [CrossRef]
Calikus, E.; Fan, Y.; Nowaczyk, S.; Sant’Anna, A. Interactive-COSMO: Consensus Self-Organized Models for Fault Detection with Expert Feedback. In Proceedings of the Workshop on Interactive Data Mining, Melbourne, VIC, Australia, 15 February 2019; WIDM’19. [Google Scholar] [CrossRef]
Castelnovo, A.; Malandri, L.; Mercorio, F.; Mezzanzanica, M.; Cosentini, A. Towards Fairness Through Time. In Proceedings of the International Workshops of ECML PKDD 2021, Virtual Event, 13–17 September 2021; Kamp, M., Kamp, M., Koprinska, I., Bibal, A., Bouadi, T., Frénay, B., Galárraga, L., Oramas, J., Adilova, L., Krishnamurthy, Y., et al., Eds.; Springer Science and Business Media Deutschland GmbH: Berlin/Heidelberg, Germany, 2021; Volume 1524 CCIS, pp. 647–663. [Google Scholar] [CrossRef]
Chen, Q.; Winter, K.; Rinderle-Ma, S. Predicting Unseen Process Behavior Based on Context Information from Compliance Constraints. In Proceedings of the BPM 2023 Forum, Utrecht, The Netherlands, 11–15 September 2023; Di Francescomarino, C., Burattin, A., Janiesch, C., Sadiq, S., Eds.; Springer Science and Business Media Deutschland GmbH: Berlin/Heidelberg, Germany, 2023; Volume 490 LNBIP, pp. 127–144. [Google Scholar] [CrossRef]
Chen, Z.; Tan, S.; Nori, H.; Inkpen, K.; Lou, Y.; Caruana, R. Using Explainable Boosting Machines (EBMs) to Detect Common Flaws in Data. In Proceedings of the International Workshops of ECML PKDD 2021, Virtual Event, 13–17 September 2021; Kamp, M., Kamp, M., Koprinska, I., Bibal, A., Bouadi, T., Frénay, B., Galárraga, L., Oramas, J., Adilova, L., Krishnamurthy, Y., et al., Eds.; Springer Science and Business Media Deutschland GmbH: Berlin/Heidelberg, Germany, 2021; Volume 1524 CCIS, pp. 534–551. [Google Scholar] [CrossRef]
Chowdhury, A.; Nguyen, H.; Ashenden, D.; Pogrebna, G. POSTER: A Teacher-Student with Human Feedback Model for Human-AI Collaboration in Cybersecurity. In Proceedings of the 2023 ACM Asia Conference on Computer and Communications Security, Melbourne, VIC, Australia, 10–14 July 2023; Association for Computing Machinery: New York, NY, USA, 2023; pp. 1040–1042. [Google Scholar] [CrossRef]
Guo, H.; Jia, F.; Chen, J.; Squicciarini, A.; Yadav, A. RoCourseNet: Robust Training of a Prediction Aware Recourse Model. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, Birmingham, UK, 21–25 October 2023; CIKM ’23, pp. 619–628. [Google Scholar] [CrossRef]
Guzy, F.; Wozniak, M.; Krawczyk, B. Evaluating and Explaining Generative Adversarial Networks for Continual Learning under Concept Drift. In Proceedings of the 2021 International Conference on Data Mining Workshops (ICDMW), Auckland, New Zealand, 7–10 December 2021; Xue, B., Pechenizkiy, M., Koh, Y.S., Eds.; IEEE Computer Society: Washington, DC, USA, 2021; pp. 295–303. [Google Scholar] [CrossRef]
Haug, J.; Braun, A.; Zürn, S.; Kasneci, G. Change Detection for Local Explainability in Evolving Data Streams. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, Atlanta, GA, USA, 17–21 October 2022; Association for Computing Machinery: New York, NY, USA, 2022; pp. 706–716. [Google Scholar] [CrossRef]
Jiang, W.; Yi, Z.; Wang, L.; Zhang, H.; Zhang, J.; Lin, F.; Yang, C. A Stochastic Online Forecast-and-Optimize Framework for Real-Time Energy Dispatch in Virtual Power Plants under Uncertainty. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, Birmingham, UK, 21–25 October 2023; CIKM ’23, pp. 4646–4652. [Google Scholar] [CrossRef]
Muschalik, M.; Fumagalli, F.; Hammer, B.; Hüllermeier, E. iSAGE: An Incremental Version of SAGE for Online Explanation on Data Streams. In Proceedings of the European Conference, ECML PKDD 2023, Turin, Italy, 18–22 September 2023; Koutra, D., Plant, C., Gomez Rodriguez, M., Baralis, E., Bonchi, F., Eds.; Springer Science and Business Media Deutschland GmbH: Berlin/Heidelberg, Germany, 2023; Volume 14171 LNAI, pp. 428–445. [Google Scholar] [CrossRef]
Nadeem, A.; Verwer, S. SECLEDS: Sequence Clustering in Evolving Data Streams via Multiple Medoids and Medoid Voting. In Proceedings of the European Conference, ECML PKDD 2022, Grenoble, France, 19–23 September 2022; Amini, M.-R., Canu, S., Fischer, A., Guns, T., Kralj Novak, P., Tsoumakas, G., Eds.; Springer Science and Business Media Deutschland GmbH: Berlin/Heidelberg, Germany, 2023; Volume 13713 LNAI, pp. 157–173. [Google Scholar] [CrossRef]
Polyvyanyy, A.; Wynn, M.T.; Looy, A.V.; Reichert, M. (Eds.) Business Process Management: 19th International Conference on Business Process Management, BPM, Rome, Italy, 6–10 September 2021; Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Cham, Switzerland, 2021; Volume 12875 LNCS. [Google Scholar]
Yang, K.; Xu, W. Fraudmemory: Explainable Memory-Enhanced Sequential Neural Networks for Financial Fraud Detection. In Proceedings of the 52nd Hawaii International Conference on System Sciences, Maui, HI, USA, 8–11 January 2019; Bui, T.X., Ed.; IEEE Computer Society: Washington, DC, USA, 2019; pp. 1023–1032. [Google Scholar]
Wang, G. Open Challenges of Malware Detection under Concept Drift. In Proceedings of the 1st Workshop on Robust Malware Analysis, Nagasaki, Japan, 30 May 2022; WoRMA ’22, p. 1. [Google Scholar] [CrossRef]
Casalino, G.; Castellano, G.; Kaczmarek-Majer, K.; Leite, D. (Eds.) OLUD 2022—Proceedings of the 1st Workshop on Online Learning from Uncertain Data Streams, co-located with the IEEE World Congress on Computational Intelligence (WCCI 2022). In CEUR Workshop Proceedings; CEUR-WS: Padova, Italy, 2022; Volume 3380. [Google Scholar]
Almeida, J.P.A.; Ciccio, C.D.; Kalloniatis, C. (Eds.) Advanced Information Systems Engineering Workshops: CAiSE 2024 International Workshops, Limassol, Cyprus, June 3–7, 2024, Proceedings; Lecture Notes in Business Information Processing; Springer: Cham, Switzerland, 2024; Volume 521. [Google Scholar]
Canovas-Segura, B.; Morales, A.; Martinez-Carrasco, A.; Campos, M.; Juarez, J.; Rodriguez, L.; Palacios, F. Improving Interpretable Prediction Models for Antimicrobial Resistance. In Proceedings of the 2019 IEEE 32nd International Symposium on Computer-Based Medical Systems (CBMS), Cordoba, Spain, 5–7 June 2019; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2019; pp. 543–546. [Google Scholar] [CrossRef]
Cianci, G.; Goglia, R.; Guidotti, R.; Kapllaj, M.; Mosca, R.; Pugnana, A.; Ricotti, F.; Ruggieri, S. Applied Data Science for Leasing Score Prediction. In Proceedings of the 2023 IEEE International Conference on Big Data (BigData), Sorrento, Italy, 15–18 December 2023; He, J., Palpanas, T., Hu, X., Cuzzocrea, A., Dou, D., Slezak, D., Wang, W., Gruca, A., Lin, J.C.-W., Agrawal, R., Eds.; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2023; pp. 1687–1696. [Google Scholar] [CrossRef]
Clement, T.; Nguyen, H.; Kemmerzell, N.; Abdelaal, M.; Stjelja, D. Coping with Data Distribution Shifts: XAI-Based Adaptive Learning with SHAP Clustering for Energy Consumption Prediction. In Proceedings of the 36th Australasian Joint Conference on Artificial Intelligence, AI 2023, Brisbane, QLD, Australia, 28 November–1 December 2023; Liu, T., Webb, G., Yue, L., Wang, D., Eds.; Springer Science and Business Media Deutschland GmbH: Singapore, 2024; Volume 14472 LNAI, pp. 147–159. [Google Scholar] [CrossRef]
Gniewkowski, M.; Maciejewski, H.; Surmacz, T.; Walentynowicz, W. Sec2vec: Anomaly Detection in HTTP Traffic and Malicious URLs. In Proceedings of the 38th ACM/SIGAPP Symposium on Applied Computing, Tallinn, Estonia, 27–31 March 2023; Association for Computing Machinery: New York, NY, USA, 2023; pp. 1154–1162. [Google Scholar] [CrossRef]
Kebir, S.; Tabia, K. On Handling Concept Drift, Calibration and Explainability in Non-Stationary Environments and Resources Limited Contexts. In Proceedings of the 16th International Conference on Agents and Artificial Intelligence—Volume 2: ICAART, Rome, Italy, 24–26 February 2024; Rocha, A.P., Steels, L., van den Herik, J., Eds.; Science and Technology Publications, Lda: Setúbal, Portugal, 2024; Volume 2, pp. 336–346. [Google Scholar] [CrossRef]
Migenda, N.; Schenck, W. Adaptive Dimensionality Reduction for Local Principal Component Analysis. In Proceedings of the 2020 25th IEEE International Conference on Emerging Technologies and Factory Automation (ETFA), Vienna, Austria, 8–11 September 2020; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2020; pp. 1579–1586. [Google Scholar] [CrossRef]
Neufeld, D.; Schmid, U. Anomaly Detection for Hydraulic Systems under Test. In Proceedings of the 2021 26th IEEE International Conference on Emerging Technologies and Factory Automation (ETFA ), Vasteras, Sweden, 7–10 September 2021; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2021. [Google Scholar] [CrossRef]
Saralajew, S.; Villmann, T. Transfer Learning in Classification Based on Manifolc. Models and Its Relation to Tangent Metric Learning. In Proceedings of the 2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, USA, 14–19 May 2017; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2017; pp. 1756–1765. [Google Scholar] [CrossRef]
Wang, X.; Wang, Z.; Shao, W.; Jia, C.; Li, X. Explaining Concept Drift of Deep Learning Models. In Proceedings of the 11th International Symposium, CSS 2019, Guangzhou, China, 1–3 December 2019; Zhang, X., Li, J., Eds.; Springer: Cham, Switzerland, 2019; Volume 11983 LNCS, pp. 524–534. [Google Scholar] [CrossRef]
Appice, A.; Tsoumakas, G.; Manolopoulos, Y.; Matwin, S. (Eds.) Discovery Science: 23rd International Conference on Discovery Science, DS 2020; Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Cham, Switzerland, 2020; Volume 12323 LNAI. [Google Scholar]
Bouadi, T.; Fromont, E.; Hüllermeier, E. (Eds.) Advances in Intelligent Data Analysis XX: 20th International Symposium on Intelligent Data Analysis, IDA 2022; Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Cham, Switzerland, 2022; Volume 13205 LNCS. [Google Scholar]
Miliou, I.; Piatkowski, N.; Papapetrou, P. (Eds.) Advances in Intelligent Data Analysis XXII: 22nd International Symposium on Intelligent Data Analysis, IDA 2024; Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Cham, Switzerland, 2024; Volume 14641 LNCS. [Google Scholar]
Rutkowski, L.; Scherer, R.; Korytkowski, M.; Pedrycz, W.; Tadeusiewicz, R.; Zurada, J.M. (Eds.) Artificial Intelligence and Soft C: 19th International Conference on Artificial Intelligence and Soft Computing, ICAISC 2020; Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Cham, Switzerland, 2020; Volume 12415 LNAI. [Google Scholar]
Wang, Z.; Tan, C. Trends and Applications in Knowledge Discovery and Data Mining: PAKDD 2024 Workshops, RAFDA and IWTA, Taipei, Taiwan, May 7–10, 2024, Proceedings (Lecture Notes in Artificial Intelligence, Band 14658); Springer: Berlin/Heidelberg, Germany, 2024. [Google Scholar]
Appice, A.; Azzag, H.; Hacid, M.S.; Hadjali, A.; Ras, Z. Foundations of Intelligent Systems: 27th International Symposium, ISMIS 2024, Poitiers, France, June 17–19, 2024, Proceedings (Lecture Notes in Computer Science Book 14670); Springer: Berlin/Heidelberg, Germany, 2024. [Google Scholar]
Baillargeon, J.T.; Cossette, H.; Lamontagne, L. Preventing RNN from Using Sequence Length as a Feature. In Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval, Bangkok, Thailand, 16–18 December 2022; Association for Computing Machinery: New York, NY, USA, 2022; pp. 16–20. [Google Scholar] [CrossRef]
Banissi, E.; Datia, N.; Pires, J.; Ursyn, A.; Nazemi, K.; Kovalerchuk, B.; Andonie, R.; Gavrilova, M.; Nakayama, M.; Nguyen, Q.; et al. Proceedings—2024 28th International Conference Information Visualisation (IV 2024); IEEE: Piscataway, NJ, USA, 2024. [Google Scholar]
Conte, D.; Fred, A.; Gusikhin, O.; Sansone, C. (Eds.) Deep Learning Theory and Applications: Proceedings of the 4th International Conference on Deep Learning Theory and Applications, DeLTA 2023; Communications in Computer and Information Science; Springer: Cham, Switzerland, 2023; Volume 1875 CCIS. [Google Scholar]
Ding, G.; Zhu, Y.; Ren, Y. Dynamic-Static Fusion for Spatial-Temporal Anomaly Detection and Interpretation in Multivariate Time Series. In Proceedings of the 8th International Joint Conference, APWeb-WAIM 2024, Jinhua, China, 30 August–1 September 2024; Yang, Z., Wang, X., Tung, A., Zheng, Z., Guo, H., Eds.; Springer Science and Business Media Deutschland GmbH: Singapore, 2024; Volume 14963 LNCS, pp. 46–61. [Google Scholar] [CrossRef]
Dries, A. Declarative Data Generation with ProbLog. In Proceedings of the 6th International Symposium on Information and Communication Technology, Hue City, Viet Nam, 3–4 December 2015; SoICT ’15, pp. 17–24. [Google Scholar] [CrossRef]
Esteve, M.; Mollá-Campello, N.; Rodríguez-Sala, J.; Rabasa, A. The Effects of Abrupt Changing Data in CART Inference Models. In Trends and Applications in Information Systems and Technologies; Rocha, A., Adeli, H., Dzemyda, G., Moreira, F., Correia, A.M.R., Eds.; Springer Science and Business Media Deutschland GmbH: Cham, Switzerland, 2021; Volume 1366 AISC, pp. 214–223. [Google Scholar] [CrossRef]
Bringas, P.G.; García, H.P.; de Pisón, F.J.M.; Flecha, J.R.V.; Lora, A.T.; la Cal, E.A.d.; Herrero, Á.; Álvarez, F.M.; Psaila, G.; Quintián, H.; et al. (Eds.) Hybrid Artificial Intelligent Systems: 17th International Conference on Hybrid Artificial Intelligence Systems, HAIS 2022; Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Cham, Switzerland, 2022; Volume 13469 LNAI. [Google Scholar]
Jain, M.; Gupta, S.; Agarwal, G.; Bansal, K. AdaBoost Based Hybrid Concept Drift Detection Approach for Mental Health Prediction. In Proceedings of the 2023 Fifteenth International Conference on Contemporary Computing, Noida, India, 3–5 August 2023; IC3-2023, pp. 345–351. [Google Scholar] [CrossRef]
Le, T.D.; Ong, K.-L.; Zhao, Y.; Jin, W.H.; Wong, S.; Liu, L.; Williams, G. (Eds.) Data Mining: 17th Australasian Conference on Data Mining, AusDM 2019; Communications in Computer and Information Science; Springer: Singapore, 2019; Volume 1127 CCIS. [Google Scholar]
Maglogiannis, I.; Iliadis, L.; Macintyre, J.; Cortez, P. Artificial Intelligence Applications and Innovations, Proceedings of the 18th IFIP WG 12.5 International Conference, AIAI 2022, Hersonissos, Crete, Greece, 17–20 June 2022; Springer: Berlin/Heidelberg, Germany, 2022. [Google Scholar]
Nader, C.; Bou-Harb, E. Revisiting IoT Fingerprinting behind a NAT. In Proceedings of the 2021 IEEE International Conference on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom), New York, NY, USA, 30 September–3 October 2021; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2021; pp. 1745–1752. [Google Scholar] [CrossRef]
Nesic, S.; Putina, A.; Bahri, M.; Huet, A.; Navarro, J.; Rossi, D.; Sozio, M. STREamRHF: Tree-Based Unsupervised Anomaly Detection for Data Streams. In Proceedings of the 2022 IEEE/ACS 19th International Conference on Computer Systems and Applications (AICCSA), Abu Dhabi, United Arab Emirates, 5–8 December 2022; IEEE Computer Society: Washington, DC, USA, 2022. [Google Scholar] [CrossRef]
Park, L.A.F.; Gomes, H.M.; Doborjeh, M.; Boo, Y.L.; Koh, Y.S.; Zhao, Y.; Williams, G.; Simoff, S. (Eds.) Data Mining: 20th Australasian Data Mining Conference, AusDM 2022; Communications in Computer and Information Science; Springer: Singapore, 2022; Volume 1741 CCIS. [Google Scholar]
Tan, G.; Chen, P.; Li, M. Online Data Drift Detection for Anomaly Detection Services Based on Deep Learning towards Multivariate Time Series. In Proceedings of the 2023 IEEE 23rd International Conference on Software Quality, Reliability, and Security (QRS), Chiang Mai, Thailand, 22–26 October 2023; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2023; pp. 1–11. [Google Scholar] [CrossRef]
Shiaeles, S.; Kolokotronis, N.; Bellini, E. Proceedings of the 2024 IEEE International Conference on Cyber Security and Resilience (CSR); IEEE: Piscataway, NJ, USA, 2024. [Google Scholar]
Ajitha, P. MedAI-DevOps Imaging Suite: Integrating CNN in Diagnostic Imaging with Continuous Deployment and Real-time Monitoring. In Proceedings of the 2024 International Conference on Trends in Quantum Computing and Emerging Business Technologies, Pune, India, 22–23 March 2024; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2024. [Google Scholar] [CrossRef]
De Arriba-Pérez, F.; García-Méndez, S.; Leal, F.; Malheiro, B.; Burguillo, J.C. Online Detection and Infographic Explanation of Spam Reviews with Data Drift Adaptation. Informatica 2024, 35, 483–507. [Google Scholar] [CrossRef]
Kulczycki, P.; Kowalski, P. A Metaheuristic for Classification of Interval Data in Changing Environments. In Information Technology and Computational Physics; Koczy, L.T., Kacprzyk, J., Kulczycki, P., Kulczycki, P., Mesiar, R., Koczy, L.T., Eds.; Springer: Cham, Switzerland, 2017; Volume 462, pp. 19–34. [Google Scholar] [CrossRef]
Liu, S.; Bronzino, F.; Schmitt, P.; Bhagoji, A.N.; Feamster, N.; Crespo, H.G.; Coyle, T.; Ward, B. LEAF: Navigating Concept Drift in Cellular Networks. Proc. ACM Netw. 2023, 1, 7. [Google Scholar] [CrossRef]
Palaniappan, S.; Veneri, G.; Gori, V.; Pratelli, T.; Ballarini, V. Anomaly Detection of Sensor Measurements During a Turbo-Machine Prototype Testing—An Integrated ML Ops, Continual Learning Architecture. In Proceedings of the International Petroleum Technology Conference, Dhahran, Saudi Arabia, February 2024; International Petroleum Technology Conference (IPTC): London, UK, 2024. [Google Scholar] [CrossRef]
Wang, R. Enhancing Thyroid Disease Prediction: Integrating Machine Learning Models for Improved Diagnostic Accuracy and Interpretability. In Proceedings of the 2024 5th International Conference on Artificial Intelligence and Electromechanical Automation (AIEA), Shenzhen, China, 14–16 June 2024; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2024; pp. 423–431. [Google Scholar] [CrossRef]
Institute of Electrical and Electronics Engineers Inc. Proceedings of 2022 International Conference on Future Trends in Smart Communities (ICFTSC); IEEE: Piscataway, NJ, USA, 2022. [Google Scholar]
Adams, J.; van Zelst, S.; Rose, T.; van der Aalst, W. Explainable Concept Drift in Process Mining. Inf. Syst. 2023, 114, 102177. [Google Scholar] [CrossRef]
Bertini, J. Graph Embedded Rules for Explainable Predictions in Data Streams. Neural Netw. 2020, 129, 174–192. [Google Scholar] [CrossRef]
Bhaskhar, N.; Rubin, D.; Lee-Messer, C. An Explainable and Actionable Mistrust Scoring Framework for Model Monitoring. IEEE Trans. Artif. Intell. 2024, 5, 1473–1485. [Google Scholar] [CrossRef]
Bijlani, N.; Maldonado, O.; Nilforooshan, R.; Barnaghi, P.; Kouchaki, S. Utilizing Graph Neural Networks for Adverse Health Detection and Personalized Decision Making in Sensor-Based Remote Monitoring for Dementia Care. Comput. Biol. Med. 2024, 183, 109287. [Google Scholar] [CrossRef]
Bosnić, Z.; Demšar, J.; Kešpret, G.; Pereira Rodrigues, P.; Gama, J.; Kononenko, I. Enhancing Data Stream Predictions with Reliability Estimators and Explanation. Eng. Appl. Artif. Intell. 2014, 34, 178–192. [Google Scholar] [CrossRef]
Cano, A.; Krawczyk, B. Evolving Rule-Based Classifiers with Genetic Programming on GPUs for Drifting Data Streams. Pattern Recognit. 2019, 87, 248–268. [Google Scholar] [CrossRef]
Choi, Y.; Yu, W.X.; Nagarajan, M.B.; Teng, P.Y.; Goldin, J.G.; Raman, S.S.; Enzmann, D.R.; Kim, G.H.J.; Brown, M.S. Translating AI to Clinical Practice: Overcoming Data Shift with Explainability. Radiographics 2023, 43, e220105. [Google Scholar] [CrossRef]
Costa, R.; Hirata, C.; Pugliese, V. A Comparative Study of Situation Awareness-Based Decision-Making Model Reinforcement Learning Adaptive Automation in Evolving Conditions. IEEE Access 2023, 11, 16166–16182. [Google Scholar] [CrossRef]
Das, R.; Ang, K.; Quek, C. IeRSPOP: A Novel Incremental Rough Set-Based Pseudo Outer-Product with Ensemble Learning. Appl. Soft Comput. J. 2016, 46, 170–186. [Google Scholar] [CrossRef]
Das, S.; Islam, M.R.; Kannappan Jayakodi, N.; Doppa, J.R. Effectiveness of Tree-based Ensembles for Anomaly Discovery: Insights, Batch and Streaming Active Learning. J. Artif. Int. Res. 2024, 80, 127–170. [Google Scholar] [CrossRef]
Demšar, J.; Bosnić, Z. Detecting Concept Drift in Data Streams Using Model Explanation. Expert Syst. Appl. 2018, 92, 546–559. [Google Scholar] [CrossRef]
Duckworth, C.; Chmiel, F.; Burns, D.; Zlatev, Z.; White, N.; Daniels, T.; Kiuber, M.; Boniface, M. Using Explainable Machine Learning to Characterise Data Drift and Detect Emergent Health Risks for Emergency Department Admissions during COVID-19. Sci. Rep. 2021, 11, 23017. [Google Scholar] [CrossRef]
Ferdaus, M.; Dam, T.; Alam, S.; Pham, D.T. X-Fuzz: An Evolving and Interpretable Neuro-Fuzzy Learner for Data Streams. IEEE Trans. Artif. Intell. 2024, 5, 4001–4012. [Google Scholar] [CrossRef]
Freitas dos Santos, T.; Osman, N.; Schorlemmer, M. Is This a Violation? Learning and Understanding Norm Violations in Online Communities. Artif. Intell. 2024, 327, 104058. [Google Scholar] [CrossRef]
Fumagalli, F.; Muschalik, M.; Hüllermeier, E.; Hammer, B. Incremental Permutation Feature Importance (iPFI): Towards Online Explanations on Data Streams. Mach. Learn. 2023, 112, 4863–4903. [Google Scholar] [CrossRef]
González–Cebrián, A.; Bradford, M.; Chis, A.; González–Vélez, H. Standardised Versioning of Datasets: A FAIR–Compliant Proposal. Sci. Data 2024, 11, 358. [Google Scholar] [CrossRef] [PubMed]
He, G.; Jin, D.; Dai, L.; Xin, X.; Yu, Z.; Chen, C. Online Learning of Temporal Association Rule on Dynamic Multivariate Time Series Data. IEEE Trans. Knowl. Data Eng. 2024, 36, 8954–8966. [Google Scholar] [CrossRef]
He, X.; Liu, Z. Dynamic Model Interpretation-Guided Online Active Learning Scheme for Real-Time Safety Assessment. IEEE Trans. Cybern. 2024, 54, 2734–2745. [Google Scholar] [CrossRef]
Jensen, R.; Ferwerda, J.; Jørgensen, K.; Jensen, E.; Borg, M.; Krogh, M.; Jensen, J.; Iosifidis, A. A Synthetic Data Set to Benchmark Anti-Money Laundering Methods. Sci. Data 2023, 10, 661. [Google Scholar] [CrossRef]
Korine, R.; Hendler, D. DAEMON: Dataset/Platform-Agnostic Explainable Malware Classification Using Multi-Stage Feature Mining. IEEE Access 2021, 9, 78382–78399. [Google Scholar] [CrossRef]
Li, X.; Liu, Z.; Yang, Z.; Meng, F.; Song, T. A High-Precision Interpretable Framework for Marine Dissolved Oxygen Concentration Inversion. Front. Mar. Sci. 2024, 11, 1396277. [Google Scholar] [CrossRef]
Roider, J.; Nguyen, A.; Zanca, D.; Eskofier, B. Assessing the Performance of Remaining Time Prediction Methods for Business Processes. IEEE Access 2024, 12, 130583–130601. [Google Scholar] [CrossRef]
Samarajeewa, C.; De Silva, D.; Manic, M.; Mills, N.; Moraliyage, H.; Alahakoon, D.; Jennings, A. An Artificial Intelligence Framework for Explainable Drift Detection in Energy Forecasting. Energy AI 2024, 17, 100403. [Google Scholar] [CrossRef]
Singh, A.; Mishra, P.; Vinod, P.; Gaur, A.; Conti, M. SFC-NIDS: A Sustainable and Explainable Flow Filtering Based Concept Drift-Driven Security Approach for Network Introspection. Clust. Comput. 2024, 27, 9757–9782. [Google Scholar] [CrossRef]
Tiukhova, E.; Vemuri, P.; Flores, N.; Islind, A.; Óskarsdóttir, M.; Poelmans, S.; Baesens, B.; Snoeck, M. Explainable Learning Analytics: Assessing the Stability of Student Success Prediction Models by Means of Explainable AI. Decis. Support Syst. 2024, 182, 114229. [Google Scholar] [CrossRef]
Vieira, D.; Fernandes, C.; Lucena, C.; Lifschitz, S. Driftage: A Multi-Agent System Framework for Concept Drift Detection. GigaScience 2021, 10, giab030. [Google Scholar] [CrossRef]
Wen, C.; Zhou, P.; Dai, W.; Dong, L.; Chai, T. Online Sequential Sparse Robust Neural Networks with Random Weights for Imperfect Industrial Streaming Data Modeling. IEEE Trans. Autom. Sci. Eng. 2024, 21, 1163–1175. [Google Scholar] [CrossRef]
Wood, M.; Ogliari, E.; Nespoli, A.; Simpkins, T.; Leva, S. Day Ahead Electric Load Forecast: A Comprehensive LSTM-EMD Methodology and Several Diverse Case Studies. Forecasting 2023, 5, 297–314. [Google Scholar] [CrossRef]
Xu, L.; Han, Z.; Zhao, D.; Li, X.; Yu, F.; Chen, C. Addressing Concept Drift in IoT Anomaly Detection: Drift Detection, Interpretation, and Adaptation. IEEE Trans. Sustain. Comput. 2024, 9, 913–924. [Google Scholar] [CrossRef]
Xuan, J.; Lu, J.; Zhang, G. Bayesian Nonparametric Unsupervised Concept Drift Detection for Data Stream Mining. ACM Trans. Intell. Syst. Technol. 2020, 12, 5. [Google Scholar] [CrossRef]
Yu, E.; Lu, J.; Zhang, G. Fuzzy Shared Representation Learning for Multistream Classification. IEEE Trans. Fuzzy Syst. 2024, 32, 5625–5637. [Google Scholar] [CrossRef]
Zhu, J.; Cai, S.; Deng, F.; Ooi, B.C.; Zhang, W. METER: A Dynamic Concept Adaptation Framework for Online Anomaly Detection. Proc. VLDB Endow. 2024, 17, 794–807. [Google Scholar] [CrossRef]
Ducange, P.; Marcelloni, F.; Pecori, R. Fuzzy Hoeffding Decision Tree for Data Stream Classification. Int. J. Comput. Intell. Syst. 2021, 14, 946–964. [Google Scholar] [CrossRef]
Farrugia, D.; Zerafa, C.; Cini, T.; Kuasney, B.; Livori, K. A Real-Time Prescriptive Solution for Explainable Cyber-Fraud Detection Within the iGaming Industry. SN Comput. Sci. 2021, 2, 215. [Google Scholar] [CrossRef]
Freitas dos Santos, T.; Osman, N.; Schorlemmer, M. A Multi-Scenario Approach to Continuously Learn and Understand Norm Violations. Auton. Agents -Multi-Agent Syst. 2023, 37, 38. [Google Scholar] [CrossRef]
Goyal, S.; Kumar, S.; Singh, S.; Sarin, S.; Gupta, B.; Arya, V.; Alhalabi, W.; Colace, F. Synergistic Application of Neuro-Fuzzy Mechanisms in Advanced Neural Networks for Real-Time Stream Data Flux Mitigation. Soft Comput. 2024, 28, 12425–12437. [Google Scholar] [CrossRef]
Li, B.; Gupta, S.; Müller, E. State-Transition-Aware Anomaly Detection under Concept Drifts. Data Knowl. Eng. 2024, 154, 102365. [Google Scholar] [CrossRef]
Monakhov, V.; Thambawita, V.; Halvorsen, P.; Riegler, M. GridHTM: Grid-Based Hierarchical Temporal Memory for Anomaly Detection in Videos. Sensors 2023, 23, 2087. [Google Scholar] [CrossRef] [PubMed]
Nesvijevskaia, A.; Ouillade, S.; Guilmin, P.; Zucker, J.D. The Accuracy versus Interpretability Trade-off in Fraud Detection Model. Data Policy 2021, 3, e12. [Google Scholar] [CrossRef]
Pasquadibisceglie, V.; Appice, A.; Ieva, G.; Malerba, D. TSUNAMI - an Explainable PPM Approach for Customer Churn Prediction in Evolving Retail Data Environments. J. Intell. Inf. Syst. 2024, 62, 705–733. [Google Scholar] [CrossRef]
Stahl, F.; Le, T.; Badii, A.; Gaber, M. A Frequent Pattern Conjunction Heuristic for Rule Generation in Data Streams. Information 2021, 12, 24. [Google Scholar] [CrossRef]
Sun, L.; Li, Z.; Li, K.; Liu, H.; Liu, G.; Lv, W. Cross-Well Lithology Identification Based on Wavelet Transform and Adversarial Learning. Energies 2023, 16, 1475. [Google Scholar] [CrossRef]
Zheng, Z.; Dong, X.; Yao, J.; Zhou, L.; Ding, Y.; Chen, A. Identification of Epileptic EEG Signals Through TSK Transfer Learning Fuzzy System. Front. Neurosci. 2021, 15, 738268. [Google Scholar] [CrossRef]
Cherkashina, T.Y. Russian regional politicians’ income and property declarations: A pilot study and quality assessment of administrative data. Sotsiologicheskiy Zhurnal 2021, 27, 8–34. [Google Scholar] [CrossRef]
Zhang, B.; Wen, Z.; Wei, X.; Ren, J. InterDroid: An Interpretable Android Malware Detection Method for Conceptual Drift. Jisuanji Yanjiu Fazhan/Comput. Res. Dev. 2021, 58, 2456–2474. [Google Scholar] [CrossRef]
Banf, M.; Steinhagen, G. Supervising The Supervisor—Model Monitoring In Production Using Deep Feature Embeddings with Applications To Workpiece Inspection. In Proceedings of the 8th World Congress on Electrical Engineering and Computer Systems and Sciences (EECSS’22), Prague, Czech Republic, 28–30 July 2022; Benedicenti, L., Liu, Z., Eds.; Avestia Publishing: Orléans, ON, Canada, 2022. [Google Scholar] [CrossRef]
Hu, H.; Kantardzic, M.; Kar, S. Explainable Data Stream Mining: Why the New Models Are Better. Intell. Decis. Technol. 2024, 18, 371–385. [Google Scholar] [CrossRef]
Mannmeusel, J.; Rothfelder, M.; Khoshrou, S. Speeding Things Up. Can Explainability Improve Human Learning? In Proceedings of the First World Conference, xAI 2023, Lisbon, Portugal, 26–28 July 2023; Longo, L., Ed.; Springer Science and Business Media Deutschland GmbH: Cham, Switzerland, 2023; Volume 1901 CCIS, pp. 66–84. [Google Scholar] [CrossRef]
Mascha, P. A Comparison of Model Confidence Metrics on Visual Manufacturing Quality Data. In Proceedings of the CVMI 2022, Allahabad, India, 12–13 August 2022; Tistarelli, M., Dubey, S.R., Singh, S.K., Jiang, X., Eds.; Springer Science and Business Media Deutschland GmbH: Singapore, 2023; Volume 586 LNNS, pp. 165–177. [Google Scholar] [CrossRef]
Muschalik, M.; Fumagalli, F.; Jagtani, R.; Hammer, B.; Hüllermeier, E. iPDP: On Partial Dependence Plots in Dynamic Modeling Scenarios. In Proceedings of the First World Conference, xAI 2023, Lisbon, Portugal, 26–28 July 2023; Longo, L., Ed.; Springer Science and Business Media Deutschland GmbH: Cham, Switzerland, 2023; Volume 1901 CCIS, pp. 177–194. [Google Scholar] [CrossRef]
Saadallah, A. Online Explainable Ensemble of Tree Models Pruning for Time Series Forecasting. In Joint Proceedings of the xAI 2024 Late-Breaking Work, Demos and Doctoral Consortium Co-Located with the 2nd World Conference on eXplainable Artificial Intelligence (xAI 2024), Valletta, Malta, 17–19 July 2024. [Google Scholar]
Suffian, M.; Bogliolo, A. Investigation and Mitigation of Bias in Explainable AI. In Proceedings of the 1st Workshop on Bias, Ethical AI, Explainability and the Role of Logic and Logic Programming (BEWARE 2022) Co-Located with the 21th International Conference of the Italian Association for Artificial Intelligence (AI*IA 2022), Udine, Italy, 2 December 2022. [Google Scholar]
Sun, J.; Sun, M.; Zhao, M.; Du, Y. Dynamic Class-Imbalanced Financial Distress Prediction Based on Case-Based Reasoning Integrated with Time Weighting and Resampling. J. Credit. Risk 2023, 19, 39–73. [Google Scholar] [CrossRef]
Hunter, J.D. Matplotlib: A 2D graphics environment. Comput. Sci. Eng. 2007, 9, 90–95. [Google Scholar] [CrossRef]
Eck, N.V.; Waltman, L. Software survey: VOSviewer, a computer program for bibliometric mapping. Scientometrics. Scientometrics 2010, 84, 523–538. [Google Scholar]

Figure 1. Data drift scenario.

Figure 2. Concept drift scenario.

Figure 3. Sudden concept drift.

Figure 4. Gradual concept drift.

Figure 5. Incremental concept drift.

Figure 6. Recurrent concept drift.

Figure 7. Shap example.

Figure 8. LIME example.

Figure 9. PDP example.

Figure 10. ALE example.

Figure 11. PI example.

Figure 12. Surrogate model example.

Figure 13. Anchor example.

Figure 14. Followed PRISMA flow chart.

Figure 15. The topic trend over the years.

Figure 16. Affiliation countries.

Figure 17. The network of collaboration between researchers, created using VOSviewer software. (a) Visualization of the complete collaboration network. (b) A zoom-in on two different clusters.

Figure 18. Taxonomy “phylogenetic” tree.

Table 1. CI: Feature values of a specific customer.

Age	Sex	Job	Housing	Savings	Amount	Dur.	Purpose
58	f	unskilled	free	little	6143	48	car

Table 2. CI: Top ten counterfactuals for the customer.

Age	Sex	Job	Amount	Dur.	$o_{2}$	$o_{3}$	$o_{4}$	$f (x^{'})$
		skilled		−20	0.108	2	0.036	0.501
		skilled		−24	0.114	2	0.029	0.525
		skilled		−22	0.111	2	0.033	0.513
−6		skilled		−24	0.126	3	0.018	0.505
−3		skilled		−24	0.120	3	0.024	0.515
−1		skilled		−24	0.116	3	0.027	0.522
−3	m			−24	0.195	3	0.012	0.501
−6	m			−25	0.202	3	0.011	0.501
−30	m	skilled		−24	0.285	4	0.005	0.590
−4	m		−1254	−24	0.204	4	0.002	0.506

Table 3. Conference ranks collected.

Rank	Paper
$A^{*}$	[23,27,30,31,32,33,34,35,36,37,38,39]
A	[40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56]
B	[57,58,59,60,61,62,63,64,65,66,67,68,69,70]
C	[71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86]
NotFound	[87,88,89,90,91,92,93]

Table 4. Journal quartiles collected.

Quartile	Paper
$Q 1$	[22,24,25,26,28,29,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125]
$Q 2$	[126,127,128,129,130,131,132,133,134,135,136]
$Q 3$	[137,138]
$Q 4$	[139,140,141,142,143,144,145,146]

Table 5. Domains that adopted explainability and interpretability of drift.

Application Domain	N. of Paper
Healthcare	12
Cybersecurity	8
Financial	6
Energy	6
Process Mining	5
Manufacturing	5
Generic Data Stream	5
Data Mining	5
Retail	2
Online Community	2
Video Surveillance	2
Other	22

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Pelosi, D.; Cacciagrano, D.; Piangerelli, M. Explainability and Interpretability in Concept and Data Drift: A Systematic Literature Review. Algorithms 2025, 18, 443. https://doi.org/10.3390/a18070443

AMA Style

Pelosi D, Cacciagrano D, Piangerelli M. Explainability and Interpretability in Concept and Data Drift: A Systematic Literature Review. Algorithms. 2025; 18(7):443. https://doi.org/10.3390/a18070443

Chicago/Turabian Style

Pelosi, Daniele, Diletta Cacciagrano, and Marco Piangerelli. 2025. "Explainability and Interpretability in Concept and Data Drift: A Systematic Literature Review" Algorithms 18, no. 7: 443. https://doi.org/10.3390/a18070443

APA Style

Pelosi, D., Cacciagrano, D., & Piangerelli, M. (2025). Explainability and Interpretability in Concept and Data Drift: A Systematic Literature Review. Algorithms, 18(7), 443. https://doi.org/10.3390/a18070443

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Explainability and Interpretability in Concept and Data Drift: A Systematic Literature Review

Abstract

1. Introduction

2. Background: Drift Phenomena and Explainable/Interpretable AI

2.1. Definitions and Notations

2.2. Drift Overview

2.2.1. Data Drift

2.2.2. Concept Drift

2.3. Explainable AI

2.3.1. Explainability Scope

2.3.2. Classical Approaches

2.4. Putting Them Together—Drift Explainability

2.5. Interpretable AI

3. Systematic Literature Review Methodology and Selection Process

4. Ranking-Based Criteria for Paper Inclusion

5. Bibliometric Overview of Retrieved Publications

6. Taxonomy

6.1. Distance/Divergence Geometry

6.1.1. Technical and Methodological Insights

6.1.2. Datasets

6.1.3. Multidimensional Class Evaluation

6.2. Change-Point and Statistical-Test Theory

6.2.1. Technical and Methodological Insights

6.2.2. Datasets

6.2.3. Multidimensional Class Evaluation

6.3. Bayesian and Uncertainty Modeling

6.3.1. Technical and Methodological Insights

6.3.2. Datasets

6.3.3. Multidimensional Class Evaluation

6.4. Causal and Temporal Dependency Tests

6.4.1. Technical and Methodological Insights

6.4.2. Datasets

6.4.3. Multidimensional Class Evaluation

6.5. Rule/Logical Pattern Mining

6.5.1. Technical and Methodological Insights

6.5.2. Datasets

6.5.3. Multidimensional Class Evaluation

6.6. Feature Attribution and Game Theory

6.6.1. Technical and Methodological Insights

6.6.2. Datasets

6.6.3. Multidimensional Class Evaluation

6.7. Latent-Representation Geometry and Similarity

6.7.1. Technical and Methodological Insights

6.7.2. Datasets

6.7.3. Multidimensional Class Evaluation

6.8. Fuzzy Transparency

6.8.1. Technical and Methodological Insights

6.8.2. Datasets

6.8.3. Multidimensional Class Evaluation

6.9. Prototype/Medoid and Exemplar Tracking

6.9.1. Technical and Methodological Insights

6.9.2. Datasets

6.9.3. Multidimensional Class Evaluation

6.10. Graph Attention and Structural Reasoning

6.10.1. Technical and Methodological Insights

6.10.2. Datasets

6.10.3. Multidimensional Class Evaluation

6.11. Optimization and Resource Scheduling

6.11.1. Technical and Methodological Insights

6.11.2. Datasets

6.11.3. Multidimensional Class Evaluation

6.12. Performance Baseline and Metric Audit

6.12.1. Technical and Methodological Insights

6.12.2. Datasets

6.12.3. Multidimensional Class Evaluation

6.13. Bias/Fairness Through Time

6.13.1. Technical and Methodological Insights

6.13.2. Datasets

6.13.3. Multidimensional Class Evaluation

6.14. Human-in-the-Loop and Pedagogical XAI

6.14.1. Technical and Methodological Insights

6.14.2. Datasets

6.14.3. Multidimensional Class Evaluation

6.15. Dataset/Provenance and Compliance Tags

6.15.1. Technical and Methodological Insights

6.15.2. Datasets

6.15.3. Multidimensional Class Evaluation

7. Research Questions

7.1. General Questions