LIMETREE: Consistent and Faithful Surrogate Explanations of Multiple Classes

Sokol, Kacper; Flach, Peter

doi:10.3390/electronics14050929

Open AccessArticle

LIME $TREE$ : Consistent and Faithful Surrogate Explanations of Multiple Classes

by

Kacper Sokol

^*,†

and

Peter Flach

Intelligent Systems Laboratory, University of Bristol, Bristol BS8 1UB, UK

^*

Author to whom correspondence should be addressed.

^†

Current address: Department of Computer Science, ETH Zurich, 8092 Zurich, Switzerland.

Electronics 2025, 14(5), 929; https://doi.org/10.3390/electronics14050929

Submission received: 8 January 2025 / Revised: 20 February 2025 / Accepted: 21 February 2025 / Published: 26 February 2025

(This article belongs to the Special Issue Explainable Artificial Intelligence: Concepts, Techniques, Analytics and Applications)

Download

Browse Figures

Versions Notes

Abstract

Explainable artificial intelligence provides tools to better understand predictive models and their decisions, but many such methods are limited to producing insights with respect to a single class. When generating explanations for several classes, reasoning over them to obtain a comprehensive view may be difficult since they can present competing or contradictory evidence. To address this challenge, we introduce the novel paradigm of multi-class explanations. We outline the theory behind such techniques and propose a local surrogate model based on multi-output regression trees—called LIME

TREE

—that offers faithful and consistent explanations of multiple classes for individual predictions while being post-hoc, model-agnostic and data-universal. On top of strong fidelity guarantees, our implementation delivers a range of diverse explanation types, including counterfactual statements favored in the literature. We evaluate our algorithm with respect to explainability desiderata, through quantitative experiments and via a pilot user study, on image and tabular data classification tasks, comparing it with LIME, which is a state-of-the-art surrogate explainer. Our contributions demonstrate the benefits of multi-class explanations and the wide-ranging advantages of our method across a diverse set of scenarios.

Keywords:

model-agnostic; post-hoc; surrogate; decision tree; explainability; interpretability; machine learning; artificial intelligence

1. Introduction

The explainability of predictive systems based on artificial intelligence (AI) algorithms has become one of their most desirable properties [1,2,3]. While a wide array of explanation types—supplemented by numerous techniques to generate them—is available [4], contrastive statements are dominant [5,6,7,8,9,10]. Their particular realization in the form of counterfactual examples is the most ubiquitous given its everyday usage among humans and solid foundations in social sciences [5] as well as its compliance with various legal frameworks [6]. Such insights are usually of the form: “Had certain aspects of the given case been different, the predictive model would behave like so instead.” The conditional part of this proposition usually prescribes a modification of the feature vector of a particular data point, whereas the hypothetical fragment of the statement tends to capture the resulting change in class prediction.

While offering a very appealing recipe for swaying an automated decision, these explanations are intrinsically restricted to a pair of outcomes, which may impact their utility, effectiveness and comprehensibility. They can either highlight an explicit contrast between two classes—“Why A rather than B?”—or be implicit instead—“Why A (as opposed to anything else)?” As a result, counterfactuals, but also single-class explanations more broadly, have been shown to simply justify conclusions of AI systems, which may be counterproductive as it implicitly limits the number of possibilities that the explainees consider, thus bias their perception, impede independent reasoning and yield unwarranted reliance on AI or prevent trust from developing altogether [11].

In human explainability, this limitation can be overcome with follow-up questions, progressively exploring and narrowing down the scope of the lack of understanding until finally eliminating it. One could imagine generating multiple counterfactuals across all the possible outcomes to mimic this process, e.g., “Why A (and not B or C)?”, “Why A rather than B?”, “Why A instead of C?”, “Why B (and not A or C)?”, “Why B instead of A?”, etc., for three outcomes A, B and C. Other explainability methods could also be employed in this scenario to provide a wider gamut of insights varying in scope, complexity and explanation target. Such approaches embody the recent hypothesis-driven decision support conceptualization of explainable AI (XAI), which aims to provide diverse evidence for data-driven predictions instead of offering a recommendation to simply accept or reject a preselected AI decision [12]; this process keeps the explainees engaged instead of displacing them, utilizes their expertise, and mitigates over- and under-dependence on automation.

However, implementing this paradigm with current XAI tools is likely to fall short given that they tend to generate independent insights whose one-class limitation prevents them from capturing and communicating a congruent bigger picture. The lack of a single origin and shared context may yield insights that do not overlap or are outright contradictory—e.g., different conditionals used by counterfactuals and disparate pieces of evidence output by other techniques—preventing the explainees from drawing coherent conclusions and adversely affecting their trust and decision-making capabilities [13]. While a promising research direction, to the best of our knowledge the challenge of generating inherently consistent explanations of multiple classes has neither been addressed for counterfactuals nor any other explanation type. In this paper, we fill this gap by introducing the novel concept of multi-class explanations, where individual insights pertaining to different predictions (of a selected instance) originate from a single explanatory source.

To this end, we:

(i): define a multi-class explainability optimization objective;
(ii): operationalize it in the form of a local surrogate;
(iii): offer an algorithm for building multi-class explainers; and
(iv): implement it with multi-output regression trees.

We evaluate our method—called LIME

TREE

—along three dimensions: an analytical assessment of human-centered XAI desiderata; a series of quantitative experiments on tabular and image data measuring explainer fidelity; and a qualitative user study capturing explainees’ preferences. We choose to demonstrate multi-class explainability with a surrogate [14] since this design yields an explainer that is post-hoc—i.e., capable of being retrofitted to pre-existing AI systems—model-agnostic—i.e., compatible with any predictive algorithm—and data-universal—i.e., suitable for tabular, text and image domains. Additionally, by using a (binary) decision tree [15] as the surrogate, LIME

TREE

offers a broad range of explanation types such as model structure visualization, feature importance, exemplars, logical rules, what-ifs and, most importantly, counterfactuals [16]. This suite of investigative mechanisms supports diverse explanation scopes spanning model simplification, subspace approximation and prediction rationales.

LIME

TREE

offers solutions to many shortcomings of currently available surrogate explainers in addition to addressing limitations found across the social and technical dimensions of XAI [17]. Specifically, by using (shallow) binary regression trees as surrogate models, it can guarantee full fidelity of the explanations with respect to the investigated black box under certain conditions, thus addressing one of the major criticisms of post-hoc approaches [1,18]. The flexible explanation generation process additionally enables it to comply with a range of desiderata such as feasibility and actionability [7] as well as facilitate algorithmic recourse [19], to name just a few [20,21]. The availability of multiple diverse explanation types also allows it to provide explainability to a broad range of stakeholders and satisfy their diverse needs [22]. With all of these contributions, we hope to launch multi-class explainability as a novel, highly beneficial XAI research direction.

2. Related Work and Background

LIME

TREE

builds upon two prominent findings in XAI: counterfactuals [5,6] and surrogate explainers [14,16,23,24,25,26]. As noted earlier, the former are lauded for their human-centered aspects, and the latter exhibit numerous appealing technical properties, making them one of the most flexible types of explainers. In a nutshell, surrogates mimic the behavior of more complex, hence opaque, predictive systems either locally or globally with simpler, inherently interpretable models, thereby offering human-comprehensible insights into their operation [14,23]. Unlike surrogates and counterfactuals, multi-class explainability is a largely under-explored topic. While counterfactual explanations can be generated for multiple classes [27], such insights may not present a coherent perspective given that they can be conditioned on different sets of features. One of the very few pieces of work, if not the only, that directly addresses this challenge expands Generalized Additive Models (GAMs [28])—which are inherently transparent and powerful predictors popular in high stakes domains [29]—to multiple classes [30].

LIME [23] is one of the most popular surrogate approaches; it uses sparse linear regression to explain (probabilistic) black-box predictions. It augments the classic paradigm of surrogate explainers with interpretable representations (IR) of raw data, making them compatible with a variety of data domains (such as images and text) and extending their applicability beyond inherently interpretable features (of tabular data). High modularity and flexibility of these explainers [24] encouraged the research community to compose their different variants, some of which use decision trees as the (local) surrogate model [9,16,25,31]. For example, Waa et al. [9] showed how a local one-vs-rest classification tree can be used to produce contrastive explanations; and Shi et al. [31] fitted a local shallow regression tree whose structure constitutes an explanation. The interpretability of decision trees and their ensembles has also been investigated outside of the surrogate explainability context [16,32,33,34]. Sokol and Flach [33,34] demonstrated how to interactively extract personalized counterfactuals from a decision tree; and Tolomei et al. [32] introduced a method to explain predictions made by tree ensembles also with counterfactuals.

More specifically, LIME builds a local surrogate model

g \in G

to explain the prediction of an instance

\overset{˚}{x} \in X

with respect to a selected class c for a probabilistic black box

f : X \mapsto Y

, where

G

is the space of (sparse linear) surrogate models,

X

is the input data domain,

Y

is the space of n-dimensional probability vectors,

n \in N^{+}

is the number of target classes, and

c \in [1, \dots, n]

. To this end, it employs a user-defined interpretable representation transformation function

IR : X \mapsto X^{'}

, which encodes presence (1) and absence (0) of

d \in N^{+}

selected human-comprehensible concepts found in a data point

x \in X

, i.e.,

X^{'} = {0, 1}^{d}

. Additionally,

IR

is defined such that the explained instance is assumed to have all of the concepts present, i.e.,

IR (\overset{˚}{x}) = {\overset{˚}{x}}^{'} = [1, \dots, 1]

, which is an all-1 vector. This step allows us to generate “conceptual” variations of

\overset{˚}{x}

by drawing a collection of binary vectors

X^{'} = {x^{'} : x^{'} \in X^{'}}

.

Next,

X^{'}

is converted back to the original data domain

X

using the inverse of the interpretable representation transformation function

{IR}^{- 1} : X^{'} \mapsto X

, i.e.,

X = {{IR}^{- 1} (x^{'}) : x^{'} \in X^{'}}

, which facilitates predicting these instances with the explained black box f, focusing on the probabilities of the selected class c, i.e.,

Y_{c} = {f_{c} (x) : x \in X}

. These predictions capture the influence of (the presence of) each human-comprehensible concept on the (change in) prediction of class c. We can quantify this dependence by fitting sparse linear regression to the binary sample

X^{'}

and probabilities

Y_{c}

. This procedure can be focused on a specific aspect of the data sample by computing its distanceℓ to the explained instance either in the original or interpretable representation—i.e.,

ℓ : X \times X \mapsto R

or

ℓ : X^{'} \times X^{'} \mapsto R

—then transformed into a similarity measure by passing it through a kernel

κ : R \mapsto R

and used as weight factor for training the surrogate model. This step allows to prioritize smaller changes to the instance, e.g., give more significance to samples with fewer alterations in the concept space.

LIME optimizes the fidelity of the surrogate, i.e., its ability to approximate the predictive behavior of the explained black box, and complexity of the resulting explanation, i.e., its human comprehensibility; this objective

O

is formalized in Equation (1). Complexity

Ω

, in the case of linear models, is computed as the number of non-zero (or significantly larger than zero) coefficients

Θ_{g}

of the surrogate g—see Equation (2). High fidelity entails small empirical loss

L

—Equation (3)—calculated between the outputs of the black box f and the surrogate g using data sampled “around” the explained instance. Individual loss components are weighted by similarity scores—

ω (x; \overset{˚}{x})

for

x \in X

or

ω (x^{'}; {\overset{˚}{x}}^{'})

for

x^{'} \in X^{'}

depending on the domain—derived by kernelising distance between the explained instance and sampled data. This loss is inspired by Weighted Least Squares, where the weights are similarity scores.

O (G; f) = \underset{g \in G}{arg min} \underset{\underset{fidelity}{⏟}}{L (f, g)} + \underset{\underset{complexity}{⏟}}{Ω (g)}

(1)

Ω (g) = \sum_{θ \in Θ_{g}} 1 (| θ | > 0) / | Θ_{g} |

(2)

\begin{matrix} \begin{matrix} L (f, g; X^{'}, \overset{˚}{x}, c) = & \frac{1}{\sum_{x^{'} \in X^{'}} ω (x^{'}; IR (\overset{˚}{x}))} \\ \sum_{x^{'} \in X^{'}} ω (x^{'}; IR (\overset{˚}{x})) {(f_{c} ({IR}^{- 1} (x^{'})) - g (x^{'}))}^{2} \\ where ω (x^{'}; {\overset{˚}{x}}^{'}) = κ (ℓ (x^{'}, {\overset{˚}{x}}^{'})) \end{matrix} \end{matrix}

(3)

The precise definitions of the interpretable representation transformation function

IR

and its inverse

{IR}^{- 1}

depend on the data domain. For text,

IR

splits it into d tokens, e.g., using the bag-of-words approach, whose presence (1) or absence (0) is encoded by

X^{'}

; setting a component of this domain to 0 is thus equivalent to removing a token from a text excerpt. For images, this domain transformation relies on the super-pixel partition of a picture into d non-overlapping patches whose binary vector encoding indicates whether a particular segment is preserved (1) or discarded (0); since parts of an image cannot be removed directly, an occlusion proxy that replaces selected patches with a predetermined color is used. Figure 1 shows an interpretable representation of an image and its LIME explanations for the top three predictions. LIME explanations of text follow a similar pattern, with it being split into tokens (its interpretable representation) whose influence on a prediction—e.g., the positive or negative sentiment of a sentence—is quantified through the coefficients of the corresponding surrogate linear model.

For tabular data, the

IR

function is more complex; continuous features are first discretized and then, together with any remaining categorical attributes, binarized. The latter step assigns, separately for every feature, 1 to the discrete partition where the explained instance is located, with all the other partitions merged and represented by 0. As a result, the mapping between

X^{'}

and

X

tends to be non-deterministic, unlike the corresponding

{IR}^{- 1}

transformation for image and text data. Further information about surrogate explainers—including their generalization and in-depth analysis of their individual building blocks in the context of text, image and tabular data domains—can be found in the literature [16,24,25,26].

3. LIME $TREE$

LIME fits a separate surrogate model to the probabilities of each class of interest. This makes the process of discovering the dependencies between multiple classes challenging as each explanation needs to be interpreted in isolation. A surrogate fitted to class A is implicitly a one-vs-rest explainer since it can only answer questions about the probability of this single class, with the complementary probability

p (\neg A) = 1 - p (A)

modeling the union of all the other classes

\neg A \equiv B \cup C \cup \dots

. Interpreting the magnitude of the probability

p (A)

output by a surrogate trained for class A can also be problematic when explaining multi-class black boxes. For example, if

p (A) \leq 0.5

, we cannot be certain whether there is a single class B with

p (B) > p (A)

, or alternatively the combined probability of all the complementary classes

p (\neg A)

is greater than or equal to

p (A)

, with no single class dominating over

p (A)

.

Moreover, linear predictors—thus such surrogates as well—are unable to model target variables that are non-linear with respect to input features [35]—a property that does not necessarily hold for high-level features such as the concepts encoded by IRs [25]. Their high inter-dependence may also have adverse effects on explanation quality. Additionally, modeling probabilities with linear regression risks confusing the explainees who expect an output bounded between 0 and 1 but may be given a numerical prediction outside of this range.

We address the challenge of simultaneously explaining multiple predicted classes of an instance output by a probabilistic model by proposing a first-of-a-kind surrogate explainer based on binary multi-output regression trees. It facilitates multi-class modeling in a regression setting, allowing the surrogate to capture the interactions between multiple classes, hence explain them coherently. Each node of such a tree approximates the probabilities of every explained class—a level of detail that is impossible to achieve with surrogate multi-class classifiers—thus reflecting how individual interventions in the interpretable domain affect the predictions. Figure 2 shows an example of a surrogate multi-output regression tree. This is a significant improvement over training a separate regression surrogate for each explained class, which may produce diverse, inconsistent, competing or contradictory explanations—thus risk confusing the explainees and put their trust at stake—whenever these models do not share a common tree structure or split on different feature subsets. Our contributions establish a new direction in XAI research—concerned with consistent and faithful explanations of multiple classes—and offer a pioneering method to address this challenge.

Moreover, employing decision trees as surrogates overcomes the shortcomings identified when linear models are used to this end [16,24,25]. Trees neither presuppose independence of features nor existence of a linear relationship between them and the target variable [35]. While surrogate regression trees that approximate the probability of a single class are guaranteed to output a number within the

[0, 1]

range—since the estimate is calculated as an average (default tree behavior)—this may not necessarily hold for multi-output trees. Approximating probabilities of multiple classes by calculating the mean of their respective predictions across a number of instances may yield averages whose sum is greater than 1; nonetheless, these values can be rescaled to avoid confusing the explainees.

While surrogates based on linear models are limited to (interpretable) feature influence explanations—see Figure 1—employing trees offers a broad selection of diverse explanation types. These include: (1) visualization of the tree structure; (2) tree-based (interpretable) feature importance (Gini importance [36]); (3) logical conditions extracted from root-to-leaf paths; (4) exemplar explanations taken from the training data assigned to the same leaf; (5) answers to what-if questions generated based on the tree structure (e.g., by querying the model); and (6) counterfactuals retrieved by comparing and applying logical reasoning to different tree paths [16]. The first two explanation types uncover the behavior of a black box in a given data subspace; the remainder targets specific predictions. Since all the six explanation types—see Section 6 for their examples—are derived from a single (surrogate) model, they are guaranteed to be coherent and their diversity should appeal to a wide range of audiences.

To ensure low complexity and high fidelity of our multi-output regression trees, we employ the optimization objective

O

from Equation (1). Since we are using surrogate trees, we modify the model complexity function

Ω

to measure the depth or width (number of leaves) of the tree as given by Equation (4), where d is the dimensionality of the binary interpretable domain

X^{'}

. This choice depends on the type of explanation that we want to extract from the surrogate tree, e.g., depth may be preferred when visualizing the tree structure or extracting decision rules. In some cases, such as unbalanced trees, optimizing for width or a mixture of the two may be more desirable. We also adapt the loss function

L

to account for the surrogate tree g outputting multiple values in a single prediction as shown in Equation (5), where

C \subseteq [1, \dots, n]

are the classes to be explained by g, for which the c subscript in

g_{c} (x^{'})

indicates the prediction of a selected class

c \in C

for the data point

x^{'}

.

Ω (g; d) = \frac{depth (g)}{d} or Ω (g; d) = \frac{width (g)}{2^{d}}

(4)

\begin{matrix} \begin{matrix} L (f, g; X^{'}, \overset{˚}{x}, C) = & \frac{1}{\sum_{x^{'} \in X^{'}} ω (x^{'}; IR (\overset{˚}{x}))} \\ \sum_{x^{'} \in X^{'}} (\frac{ω (x^{'}; IR (\overset{˚}{x}))}{1 + 1 (| C | > 1)} \sum_{c \in C} {(f_{c} ({IR}^{- 1} (x^{'})) - g_{c} (x^{'}))}^{2}) \end{matrix} \end{matrix}

(5)

Note that the inner sum over the explained classes

\sum_{c \in C}

is normalized by

{(1 + 1 (| C | > 1))}^{- 1}

, which is 1 when the surrogate is built for a single class and becomes

\frac{1}{2}

for more classes. The loss given by Equation (5) is thus equivalent to the one in Equation (3) in the former case, and in the latter, the scaling factor ensures that the inner sum is between 0 and 1 since the biggest squared difference is 2, which happens when the predictions of f and g assign a probability of 1 to two different classes, e.g.,

[1, 0, 0]

and

[0, 0, 1]

. An additional assumption is that the sum of values predicted by each leaf of the surrogate tree is at most 1, which, as noted earlier, may in some cases require normalization. In practice, the surrogate explainer is built by iteratively adding splits to a multi-output regression tree—thus incrementally increasing its complexity

Ω (g; d)

but also improving its predictive power—which allows it to progressively minimize the loss

L

and optimize the objective

O

. This procedure—captured by Algorithm A1 given in Appendix A—terminates when the loss

L

(calculated with Equation (5)) reaches a certain, user-defined level

ϵ \in [0, 1]

, which corresponds to the fidelity of the local surrogate, i.e.,

L (f, g; X^{'}, \overset{˚}{x}, C) \leq ϵ

. Figure 3 provides a high-level overview of LIME

TREE

.

4. Fidelity Guarantees

The flexibility of surrogate explainers—they are post-hoc, model-agnostic and, often, data-universal—also contributes to the instability and occasional unreliability of their explanations [24,25,26,37,38]. Their subpar fidelity, i.e., predictive coherence with respect to the explained black box, is thus a major barrier to their uptake [1]. In addition to remedying the shortcomings of linear surrogates, LIME

TREE

comes with strong fidelity guarantees, which can be achieved in practice while preserving low explanation complexity.

To imbue LIME

TREE

with near-full or full fidelity, we identify the minimal IR set

X_{\min, T}^{'} \subseteq X^{'}

. It is unique to a tree T and composed of binary vectors

x_{\min, t}^{'}

drawn from the IR—one per leaf

t \in T

of the surrogate tree—that have the least number of 0 components while still being assigned to the leaf t. The construction of this set is formalized in Definition 1 and can be understood as seeking instances with the highest number of human-interpretable concepts being present, e.g., minimal occlusion for images, for each leaf.

Definition 1

(Minimal Representation). Assume a binary decision tree

g \in G

fitted to a binary d-dimensional data space

X^{'} = {0, 1}^{d}

, with T denoting its set of leaves. This tree assigns a leaf

t \in T

to a data point

x^{'} \in X^{'}

with the function

g_{id} (x^{'}) = t

. For a given tree leaf t, its uniqueminimal data point

x_{\min, t}^{'}

is given by

x_{\min, t}^{'} = \underset{x^{'} \in X^{'}}{arg max} \sum_{i = 1}^{d} x_{i}^{'} s . t . g_{id} (x^{'}) = t,

where

x_{i}^{'}

is the i-th component of the binary vector

x^{'}

. We can further define a minimal set of data points

X_{\min, T}^{'} \subseteq X^{'}

—uniquely representing a tree g and the set of its leaves T—that is composed of all the minimal data points for this tree as

X_{\min, T}^{'} = {x_{\min, t}^{'} : t \in T} .

Next, we transform this minimal representation set

X_{\min, T}^{'}

from the interpretable into the original domain using the inverse of the IR transformation function:

X_{\min, T} = {{IR}^{- 1} (x_{\min, t}^{'}) : x_{\min, t}^{'} \in X_{\min, T}^{'}}

. We then predict class probabilities for each instance in

X_{\min, T}

with the black box f and replace the values estimated by the surrogate tree with these probabilities for each leaf

t \in T

, i.e., modify the surrogate tree by overriding its predictions. Doing so is only feasible for the tree leaves as the minimal data points for some of the splitting nodes are indistinguishable, e.g., all the nodes on the root-to-leaf path that decides every interpretable feature to be 1 are non-unique and all would be represented by the unmodified explained instance. We additionally assume that the explained predictive model is deterministic, therefore it always outputs the same prediction for a given instance.

This variant of LIME

TREE

—called TREE and codified by Algorithm A2 provided in Appendix A—guarantees full fidelity of the surrogate tree with respect to the explanations derived from the tree structure such as counterfactuals and root-to-leaf decision rules (see Section 6 for their examples). However, for this property to hold the function

IR

transforming data from their original domain into the interpretable representation has to be deterministic [25], which holds for image and text but not for tabular data (refer back to Section 2). The rationale behind this claim is outlined in Lemma 1 (proof in Appendix B), which follows from the subsequent discussion.

Lemma 1

(Structural Fidelity). A surrogate tree can achieve full fidelity with respect to the explanations derived from its structure—i.e., model-driven explanations—if the interpretable representation transformation function

IR

is deterministic. Therefore, an instance

x \in X

can be translated into a unique point

IR (x) = x^{'} \in X^{'}

and vice versa

{IR}^{- 1} (x^{'}) = x

, i.e., the mapping is one-to-one.

Lemma 1 guarantees that each leaf in the surrogate tree is associated with only one data point

x_{\min, t}

in the original representation

X

. This instance is derived from the minimal interpretable data point

x_{\min, t}^{'}

by applying the inverse of the interpretable representation transformation function

{IR}^{- 1}

, i.e.,

x_{\min, t} = {IR}^{- 1} (x_{\min, t}^{'})

. Therefore,

x_{\min, t}

represents the explained instance with the smallest possible number of concepts deleted from it such that

g_{id} (x_{\min, t}^{'}) = t

. By assigning the probabilities predicted by the explained black box for each data point

x_{\min, t}

to the corresponding leaf t of the surrogate, it achieves full fidelity for the minimal representation set

X_{\min, T}

, which is the backbone of model-driven explanations.

While such an approach ensures full fidelity of model-driven explanations, the same is not guaranteed for data-driven explanations such as answers to what-if questions, e.g., “What if concept

x_{i}^{'}

is absent?” Root-to-leaf paths that do not condition on all of the binary interpretable features allow for more than one data point to be assigned to that leaf, e.g., for three binary features

[x_{1}^{'}, x_{2}^{'}, x_{3}^{'}] \in {0, 1}^{3}

, a root-to-leaf path with a

x_{1}^{'} < 0.5 \land x_{3}^{'} < 0.5

condition assigns

[0, 0, 0]

and

[0, 1, 0]

to this leaf. This observation motivates the minimal interpretable representation

X_{\min, t}^{'}

(Definition 1), which selects a single data point to represent each leaf thereby facilitating full fidelity of model-driven explanations without additional assumptions. However, for data-driven explanations to achieve full fidelity, the surrogate tree must faithfully model the entire interpretable feature space, i.e., have one leaf for every data point in

X^{'}

, which can be thought of as extreme overfitting. Since the cardinality of a binary d-dimensional space

B^{d} = {0, 1}^{d}

is given by

| B^{d} | = 2^{d}

, and a complete and balanced binary decision tree of

2^{d}

width (number of leaves) is d deep, relaxing the tree complexity bound

Ω

accordingly guarantees full fidelity of all the explanations—a property captured by Corollary 1 (proof in Appendix B).

Corollary 1

(Full Fidelity). If the complexity bound (width) Ω of a surrogate tree g is relaxed to equal the cardinality of the binary interpretable domain

X^{'}

, i.e.,

Ω (g; | X^{'} |) = \frac{width (g)}{2^{| X^{'} |}} = \frac{2^{| X^{'} |}}{2^{| X^{'} |}} = 1

, then the surrogate is guaranteed to achieve full fidelity. This property applies to explanations that are both data-driven—i.e., derived from any data point in the interpretable representation—and model-driven—i.e., derived from the structure of the surrogate tree.

Therefore, a surrogate tree that guarantees faithfulness of model-driven explanations (Lemma 1) can only deliver trustworthy counterfactuals and exemplar explanations sourced from the minimal representation set. This may be an attractive alternative to more complex surrogate trees that additionally guarantee faithfulness of data-driven explanations (Corollary 1). The latter surrogate type, which usually yields deeper trees, can deliver a broader spectrum of trustworthy explanations: tree structure-based explanations, feature importance, decision rules (root-to-leaf paths), answers to what-if questions and exemplar explanations based on any data point, in addition to counterfactuals.

5. Qualitative, Quantitative and User-Based Evaluation

Next, we assess the explanatory power of LIME

TREE

with a multi-tier evaluation approach that consists of an assessment guided by XAI desiderata (Section 5.1) as well as functionally grounded (Section 5.2) and human-grounded (Section 5.3) experiments [20,39]. The first judges our approach against a number of criteria important for XAI systems; the second involves a (synthetic) proxy task in which we compare the (numerical) fidelity of LIME with multiple variants of LIME

TREE

on image and tabular data; the third reports results of a pilot user study, which is based on image classification to enable straightforward qualitative evaluation of explanations by means of visual inspection, thus alleviating the need for technical expertise.

5.1. Desiderata

XAI systems can generally be decomposed into two operationally distinct parts, one responsible for explanation generation and another for its presentation; this separation allows us to better identify, evaluate and report the unique desiderata important at each stage [40]. Given that LIME

TREE

is a surrogate explainer, the insights that it generates are post-hoc, therefore they may not reflect the true behavior of the underlying black box [1]. This discrepancy—empirically measured as fidelity—is an important indicator of explanation truthfulness, which should always be communicated to the explainees, especially in high stakes applications. While LIME

TREE

can achieve full fidelity without sacrificing explanation comprehensibility, this desideratum is limited to IRs that are deterministic. To take advantage of this property it is therefore important to design an IR that addresses the explainability needs of a particular use case, which may require additional effort to build such a bespoke module despite the explainer itself being model-agnostic [25,26,41]. More broadly, truthfulness is a major advantage of our approach given that it allows retrofitting explainability into pre-existing black boxes. Whatever explanation type, presentation format and communication medium are chosen, this property guarantees that the explanatory insights are based on an accurate reflection of the black-box model’s behavior.

Before reviewing the desiderata of specific explanation types, we discuss a set of general properties that are expected of all explanatory insights [20]. LIME

TREE

excels when it comes to explanation plurality and diversity—especially so given their consistency—allowing the explainees to explore distinct aspects of the underlying black box without running into spuriously contradictory observations, further improving the trustworthiness of its explanations. While some of them are inherently static, others can be operationalized within an interactive explanatory protocol [34], enabling the explainees to customize and personalize them in a natural way—refer to Section 6 for examples. This breadth of explanatory insights and access to their source—the surrogate tree structure (see Figure 2)—enables their contextualization, which makes them particularly appealing since good explanations do not only communicate what information is used by a predictive model but also how it is used [1].

By simultaneously accounting for multiple classes, LIME

TREE

offers a more comprehensive picture of the explained model’s predictive behavior and facilitates user-driven exploration, which, as noted in Section 1, can mitigate automation bias, especially so for counterfactuals [11]. Also, recall that our method is compatible with hypothesis-driven XAI since the breadth of its insights allows the explainees to consider multiple congruent explanations for different predictions of a given instance instead of only receiving a justification of the top prediction [12]. Given that our method operates as a surrogate, we can freely tweak and tune the target, breadth and scope of its explanations by adjusting its configuration, which further adds to its flexibility [20,24,25,26,34].

While LIME

TREE

offers a broad spectrum of explanation types—whose diversity makes it appealing to a wide range of audiences—we anticipate the counterfactuals to be the most attractive given their ubiquity in XAI [5]. Notably, these insights are ante-hoc with respect to the surrogate tree, therefore their truthfulness is guaranteed in this regard [42]. Their generation procedure allows to account for plausibility and actionability of their conditional part as well as other (human-centered) properties that may be desired [20,33,34,43]. Counterfactual explanations are known to be intrinsically comprehensible given their parsimony and low complexity, making them an attractive choice across a diverse range of applications [5,20].

5.2. Synthetic Experiments

We evaluate the trustworthiness and comprehensibility of LIME

TREE

explanations using the two components of the optimization objective

O

(Equation (1))—fidelity

L

and complexity

Ω

—as computational proxies. The former measures the faithfulness of the surrogate with respect to the black box, i.e., its ability to mimic the black box, which is the only metric capable of reporting the reliability of all the diverse explanation types extracted from the surrogate. To this end, we employ the formulations of fidelity used by both LIME (Equation (3)) and LIME

TREE

(Equation (5)); we compute this property when modeling the top three classes predicted by the black box for each test instance. We additionally analyze the complexity of LIME

TREE

surrogates calculated as the tree depth normalized by the dimensionality of the IR (Equation (4)); we then compare it with the corresponding measure for LIME surrogates, which is computed by counting the number of non-zero coefficients of the underlying linear models, i.e., their size (Equation (2)).

We study three variants of LIME

TREE

, all of which minimize fidelity but differ in complexity constraints and post-processing:

TREE: optimizes a surrogate tree for complexity, i.e., it determines the shallowest tree that offers the desired level of fidelity;
TREE: is a variant of TREE whose predictions are post-processed to guarantee full fidelity of model-driven explanations; and
TREE ^†: constructs a surrogate tree without any complexity constraints, allowing the algorithm to build a complete tree that guarantees full fidelity of both model- and data-driven explanations.

These realizations of LIME

TREE

are the most relevant given that each one offers a surrogate with distinct fidelity characteristics that lead to certain types of tree-based explanations achieving desired properties as explained earlier in Section 4. We compare the fidelity of these explainers to LIME with disabled feature selection, which allows it to achieve maximum fidelity at the expense of explanation size. Our study is limited to fidelity and complexity since XAI lacks metrics suitable for multi-class explainability or for cases when multiple explanation types are derived from a single source as well as for explanations that rely on probabilities rather than crisp predictions (to mitigate automation bias) [44]. LIME is our only baseline given the general lack of multi-class explainers or methods whose underlying surrogate model can be directly accessed.

Table 1, which reports the results of our evaluation, also summarizes our experimental setup. We use a collection of popular multi-class image and tabular data sets; with the former we rely on a selection of pretrained neural networks, and with the latter we split the data into stratified 80% training and 20% test sets, and fit the models ourselves. LIME and LIME

TREE

are implemented following best practice described in the literature [24,25,26,45,46,47]. For images, we use an IR built upon SLIC (edge-based) segmentation [48] with black color occlusion as the information removal proxy; given its deterministic transformation function, we operate directly on the binary interpretable domain and generate its full set of instances instead of their random sample to enable the surrogate to reach full fidelity. For tabular data, we sample 10,000 instances around the explained data point in the original domain—using mixup, which is an explicitly local sampler that accounts for class labels [45,49]—since the corresponding IR transformation function is non-deterministic; we use quartile-based discretization applied to the data sample followed by binarization as our interpretable domain. For images we use cosine distance measured in the IR, and for tabular data we use Euclidean distance measured in the original domain; we use the exponential kernel for both, with its optimal parameter determined experimentally for each data set. Our code is available on GitHub at https://github.com/So-Cool/bLIMEy/tree/master/ELECTRONICS_2025 (accessed on 20 February 2025).

In our experiments, LIME produces three independent linear surrogates, one per class; each LIME

TREE

variant is either built as a single surrogate that models all of the classes simultaneously (n-th top), or a separate surrogate is constructed for a one-, two- and three-class problem (top n). In deployment, however, LIME

TREE

fits only a single multi-output tree, whereas LIME requires as many models as explained classes. As a result, since both methods follow the same steps except for the surrogate model training phase, our method tends to be faster for relatively small trees given that they are fitted to binary data with feature thresholds fixed at

\frac{1}{2}

—up to the depth of 20 in our experiments—and becomes negligibly slower for large trees—requiring 250 milliseconds more than LIME for trees as deep as 40—but these measures will fluctuate with the number of explained classes and the IR dimensionality. Since the number of interpretable features should be kept low to improve human comprehensibility of the explanations, which directly limits the surrogate tree depth, we expect LIME

TREE

to be faster in practice [25].

To assess explanation quality we measure multi-class fidelity with the LIME

TREE

loss as well as the fidelity of each class separately with the LIME loss. The experimental results, summarized in Table 1, show that our base method—TREE—provides more faithful explanations than LIME at

\frac{2}{3}

of its complexity for tabular and image data. TREE—which post-processes the surrogate tree to facilitate full fidelity of model-driven explanations when the IR transformation function is deterministic—also surpasses LIME at

\frac{3}{4}

of its complexity for image data given their compliant IR, but its performance is degraded for tabular data even at full tree complexity (100%) due to the stochasticity of the underlying IR. TREE requires higher complexity, i.e., deeper trees, than TREE to achieve comparable fidelity since the post-processing step makes the surrogate faithful with respect to the minimal interpretable data points but at the same time suboptimal for the remainder of the interpretable space, which is especially detrimental for stochastic IRs where each minimal interpretable data point corresponds to multiple instances in the original data domain.

The version of LIME

TREE

without a depth bound—TREE^†, which is equivalent to TREE@100% (and TREE@100% for deterministic interpretable representations)—achieves full fidelity across the board for a deterministic IR (images), where it faithfully models the entire interpretable data space by constructing one leaf per instance, but fails to do so for a non-deterministic IR (tabular) because in this case each tree leaf has to model multiple distinct data points. By allowing deeper trees we reduce the impurity of their leaves, which improves the overall performance of the surrogates—an intuitive relation, and trade-off, between the complexity of the trees and their fidelity, two representative examples of which are shown in Figure 4. Appendix C provides the complete collection of plots depicting the behavior of the LIME and LIME

TREE

loss for all the data sets used in our experiments.

5.3. Pilot User Study

To assess the real-life usefulness of our approach, we ran a pilot user study. We recruited eight participants (six males and two females) evenly distributed across the 18–45 age range with diverse skills and backgrounds; six of them had a machine learning background and three were familiar with AI explainability. The participants were not compensated for their involvement in the user study. We exposed them to LIME (Figure 1) and LIME

TREE

(Figure 2) explanations in a random order without revealing the method’s name. The study consisted of two sections, one per explainer, displaying an image split into three segments, with each part enclosing a unique object, e.g., a cat, a dog and a ball. The two most pertinent black-box predictions for each object were then explained with both methods—e.g., tabby and tiger cat for the cat object, golden retriever and Labrador retriever for the dog object, and tennis ball and croquet ball for the ball object—yielding six LIME explanations and a single multi-output tree spanning all six predictions. The participants were offered a brief tutorial illustrating how to parse the tree structure to obtain a variety of explanations.

The participants were then asked about the expected behavior of the black box in relation to any two out of the three displayed objects for each explainer—six questions in total as the relations are assumed to be non-reflexive. For example, “How does the presence of the cat object affect the model’s confidence of a presence of the dog object?”, with three possible answers: confidence decreases, confidence not affected and confidence increases. This question formulation was chosen to avoid a bias towards either explainer since we could neither ask for the importance or influence of each object on a particular prediction (LIME’s domain), nor the relation between an object and a prediction, e.g., a what-if question (LIME

TREE

’s domain). Before viewing the explanations, the participants were asked to answer a similar set of questions using only their intuition, which allowed us to assess whether the explainees still relied on their intuition when explicitly asked to work with the explanations. Figure 5 provides a high-level overview of the flow of our user study.

Our findings indicate a negligible overlap between the responses based on the participants’ intuition and both explainers; they also show that LIME

TREE

helped the participants to answer 25% more of the questions correctly as compared with LIME. All of the participants indicated that using LIME was either easy or very easy, and at the same time rated the process of manually extracting LIME

TREE

explanations as either difficult or very difficult, despite many of the explainees having AI background. This disparity in conjunction with subpar performance when using LIME suggests that the explainees misinterpreted its explanations and were overconfident [56,57]; good performance when working with LIME

TREE

despite the difficulty in using its explanations, on the other hand, is promising given that the process of extracting them can be easily automated.

6. Discussion

LIME

TREE

explanations are versatile and appealing but achieving their full fidelity presupposes a deterministic IR transformation function (Lemma 1) and a complete surrogate tree (Corollary 1). This is not a problem for image and text data since the corresponding IRs can be built to be deterministic and of low dimensionality (given by the number of desired human-comprehensible concepts). The IR of tabular data, however, is inherently non-deterministic [25]—due to the many-to-one mapping introduced by discretization and binarization (refer back to Section 2)—with its dimensionality equal to the size of the original feature space. Nonetheless, since uniquely for tabular data the surrogate tree can be trained directly on their original representation, thus implicitly constructing a locally faithful and meaningful IR instead of relying on an external one [24,25], the surrogate can be overfitted to maximize its fidelity. While LIME

TREE

offers a close approximation in both cases, full fidelity cannot be guaranteed since even a complete surrogate tree is unable to achieve full coverage for non-deterministic IRs. The consequences of this shortcoming can be seen in our experimental results (refer to Table 1), which show that a complete surrogate tree— labeled TREE^†—can reach full fidelity for images but not for tabular data.

In practice, full fidelity of surrogates based on deterministic IRs (Lemma 1) is achieved by adjusting the sample size

| X^{'} |

and relaxing the tree complexity bound

Ω

. Recall that a d-dimensional binary interpretable representation

X^{'} \equiv B^{d} = {0, 1}^{d}

has

| X^{'} | = 2^{d}

unique instances, and the width, i.e., the number of leaves, of a complete, balanced binary decision tree of depth d is

2^{d}

(Corollary 1). Therefore, we can use all of these data points—there is no benefit from oversampling—to easily train a local surrogate with its complexity bound

Ω

removed to allow complete trees of depth d, i.e., with one leaf per instance, guaranteeing full fidelity and access to a diverse range of faithful and comprehensible explanations. The depth bound and the sample size can be adjusted dynamically prior to training the surrogate to ensure its optimality since the size of the interpretable domain is known beforehand.

Since for images as well as text each dimension of the IR captures a human-comprehensible concept, their number is expected to be low, especially that tokens in text excerpts and segments in images do not have to be adjacent to constitute a single concept. For every additional feature in the interpretable space, the number of sampled data points doubles and the tree depth is incremented by one in order to provide the interpretable domain and the surrogate tree with enough capacity to preserve the full fidelity guarantee. While this exponential growth in the number of interpretable data points may seem overwhelming, training decision trees on binary data spaces is fast given the predetermined

\frac{1}{2}

split at every node. The exponential growth of the width of the surrogate tree that guarantees its full fidelity increases its complexity and can have adverse effects on the comprehensibility of some explanation types, however, as we show next, it does not affect the most important and versatile explanation kinds.

Guaranteeing full fidelity of a surrogate tree requires relaxing its complexity bound

Ω

, which the optimization objective

O

tries to minimize (Equation (1)). Since in this setting a moderate number of interpretable features may yield a relatively large tree, the increased complexity of the resulting explanations is concerning. While a complex surrogate tree may render the explanations based on its structure, e.g., model visualizations, incomprehensible, these are not the most appealing explanation types and their appreciation often requires AI expertise. The (interpretable) feature importance, what-if explanations, counterfactuals and exemplars are not affected by the tree size in any way and remain highly compact and comprehensible—see Figure 6 for some examples and Appendix D for a more comprehensive overview of diverse explanation types. Notably, a complete surrogate tree with full fidelity will produce more counterfactual explanations for every data point, making it more interpretable.

The decision rules—logical conditions extracted from root-to-leaf paths—may indeed become overwhelmingly long, in fact as long as the tree depth, however this does not impact all the data types equally and an appropriate presentation medium can alleviate this issue regardless of the tree complexity. For image and text data, such rules will always be comprehensible, no matter their length, since they cannot have more literals than the dimensionality of the underlying interpretable domain, i.e., the number of segments for images and word-based tokens for text. Presenting this rule in the former case corresponds to displaying an image with its various segments occluded—e.g., see Figure 6d—and in the latter producing a text excerpt with selected tokens removed. For tabular data, however, these rules may become relatively long and incomprehensible since this domain lacks a similar human-friendly representation; the exception is root-to-leaf paths that impose multiple logical conditions on a single feature (in the original domain), allowing for their compression. Regardless of the presentation medium, a general criticism of rule-based explanations is the difficulty of understanding how each logical condition affects the prediction, making them less appealing than other explanation types.

In view of these observations, if explanations based on the structure of the surrogate tree are not required for image and text data, and additionally rule-based explanations are not needed for tabular data, the model complexity

Ω

does not have to be minimized. It can therefore be removed from the optimization objective

O

given in Equation (1), paving the way for full explanation fidelity. LIME

TREE

therefore delivers a practical surrogate explainer with strong guarantees and well-understood limitations. Since it can be used with any AI model and data type—albeit with some constraints for tabular data—it promises to become an invaluable tool for inspecting, debugging and explaining black-box predictive systems.

7. Conclusions and Future Work

In this paper, we introduced the concept of multi-class explainability and proposed a surrogate explainer—called LIME

TREE

—based on multi-output regression trees that is compatible with this paradigm. We then analyzed its various properties and guarantees and showed how it can achieve full fidelity. Next, we demonstrated how LIME

TREE

improves upon LIME and discussed the benefits of using trees as surrogate models. We supported these claims with an assessment of its properties based on XAI desiderata as well as a collection of quantitative experiments and a pilot user study. At a higher level, the multi-class explainability paradigm delivers more comprehensive insights into the functioning of opaque predictive models than are otherwise available with current XAI conceptualizations. Additionally, our implementation of this paradigm in the form of a surrogate explainer enables its straightforward adoption across many application domains given our tool’s compatibility with diverse data types as well as its clear guarantees and limitations. Together, our contributions advance both the conceptual and practical frontiers of explainable artificial intelligence.

In future work, we will implement methods to algorithmically extract human-centered explanations from (surrogate) trees—looking into the interactive aspects of this process—and evaluate them with large-scale user studies. We will also investigate alternative interpretable representations of tabular data that would allow LIME

TREE

to achieve better fidelity guarantees for this data domain (e.g., by providing a deterministic IR transformation function). Finally, we plan to explore other techniques capable of realizing the multi-class explainability paradigm as well as look into expanding this concept to better account for prediction uncertainty output by probabilistic classifiers—a perspective that is largely neglected by the XAI literature.

Author Contributions

Conceptualization, K.S.; methodology, K.S.; software, K.S.; validation, K.S.; formal analysis, K.S.; investigation, K.S.; resources, P.F.; writing—original draft preparation, K.S.; writing—review and editing, K.S. and P.F.; visualization, K.S.; supervision, P.F.; funding acquisition, P.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the TAILOR project, funded by EU Horizon 2020 research and innovation program (grant agreement number 952215).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The following data sets were used for this research: ImageNet (https://image-net.org/, accessed on 20 February 2025); CIFAR-10 and CIFAR-100 (https://www.cs.toronto.edu/~kriz/cifar.html, accessed on 20 February 2025); Wine (https://archive.ics.uci.edu/dataset/109/wine, accessed on 20 February 2025); and Forest Covertypes (https://archive.ics.uci.edu/dataset/31/covertype, accessed on 20 February 2025). The source code needed to reproduce our experiments is available on GitHub (https://github.com/So-Cool/bLIMEy/tree/master/ELECTRONICS_2025, accessed on 20 February 2025).

Acknowledgments

We would like to acknowledge the contributions of Alexander Hepburn and Raul Santos-Rodriguez, who helped with the development of the code used for the experiments and offered insightful feedback.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AI	Artificial Intelligence
CPU	Central Processing Unit
GAM	Generalized Additive Model
GPU	Graphics Processing Unit
IR	Interpretable Representation
LIME	Local Interpretable Model-agnostic Explanations
XAI	eXplainable Artificial Intelligence

Appendix A. LIME $TREE$ Algorithms

Algorithm A1 captures the vanilla variant of the LIME

TREE

explainer that operates on the interpretable representation (see Section 2)—referred to as TREE—and Algorithm A2 outlines the post-processing procedure—called TREE—applied to achieve full fidelity of the tree structure-based explanations (referred to as model-driven explanations throughout this paper). TREE can be built upon most off-the-shelf tree learning methods that allow for binary splits. While it is relatively lightweight, manipulating data points (e.g., images) via the

IR

and

{IR}^{- 1}

functions and querying the black-box model f may become a bottleneck. The explainee has no control over the computational and memory complexity of querying the black box, which is executed s times, where

s \in N^{+}

is the number of sampled data points. Given the recent advances in dedicated AI hardware, this step should not be a burden when utilizing GPUs (Graphics Processing Units) and is manageable with just CPUs (Central Processing Units).

Transforming the interpretable representation (binary vectors) into the original data (e.g, image) domain may require a considerable amount of operational memory: the explained instance (e.g., image) has to be duplicated for every data point sampled from the interpretable domain and its feature (e.g., pixel) values need to be altered to reflect the removed concepts (e.g., segment occlusions). The efficiency of these two steps can be significantly improved with batch processing and parallelization, therefore reducing the use of memory and improving the processing time. Other steps in Algorithm A1 are relatively efficient: discretizing continuous features or segmenting an image, sampling a binary matrix from the interpretable domain and fitting a multi-output regression tree to binary data with feature thresholds fixed at

\frac{1}{2}

. For tabular data, TREE may be adapted to operate on the original domain instead, using this representation to sample data and compute distances; the explainer is highly efficient for this data type.

Algorithm A1: The TREE (vanilla) variant of LIME

TREE

.

Algorithm A2: The TREE variant of LIME

TREE

.

Appendix B. Proofs

Proof of Lemma 1 (Structural Fidelity).

Full (empirical) fidelity is achieved when both the explained model f (which is assumed to be non-stochastic) and its surrogate g deliver identical predictions across all the selected classes C for a predetermined set of data points X. Structural fidelity narrows down the scope of consideration to instances

X_{\min, T}^{'}

(from the binary interpretable representation space) that are described by the logical conditions specified by the leaves of the surrogate tree T:

f_{c} ({IR}^{- 1} (x^{'})) = g_{c} (x^{'}) \forall c \in C \forall x^{'} \in X_{\min, T}^{'} .

When the interpretable representation transformation function is deterministic, i.e.,

IR ({IR}^{- 1} (x^{'})) = x^{'} \forall x^{'} \in X^{'}

, post-processing the surrogate with Algorithm A2 guarantees its outputs to align with those of the explained model for instances from the minimal representation

X_{\min, T}^{'}

(Definition 1), which are the backbone of model-driven explanations. Therefore, this surrogate achieves full structural fidelity:

f_{c} ({IR}^{- 1} (x^{'})) = g_{c} (x^{'}) \forall c \in C \forall x^{'} \in X_{\min, T}^{'} .

□

Proof of Corollary 1 (Full Fidelity).

Growing a complete, d-deep binary surrogate tree T for a d-dimensional binary interpretable representation

X^{'}

allows each data point to be assigned a unique tree leaf:

X_{\min, T}^{'} \equiv X^{'} .

Consequently, when the interpretable representation transformation function is deterministic, the predictions of the surrogate will be identical to those of the explained model (which is assumed to be non-stochastic) across the entire interpretable representation space. Therefore, this surrogate achieves full fidelity (both with respect to model- and data-driven explanations):

f_{c} ({IR}^{- 1} (x^{'})) = g_{c} (x^{'}) \forall c \in C \forall x^{'} \in X^{'} .

□

Appendix C. Loss Behavior

Table 1 reports the fidelity loss of various LIME

TREE

variants for fixed complexity levels (66%, 75% and 100%). To better understand the relation between the complexity of the surrogate trees and their fidelity—expanding the results shown in Figure 4—we plot these quantities in Figure A1, Figure A2, Figure A3, Figure A4 and Figure A5 respectively for the ImageNet, CIFAR-10, CIFAR-100, Wine and Forest Covertypes data sets. It allows us to study how building surrogate trees of higher complexity influences their fidelity, and how these properties compare with the baseline given by linear surrogates (LIME) whose complexity is fixed. Since for the ImageNet, CIFAR-10 and CIFAR-100 data sets various images may have a different number of super-pixels, i.e., interpretable features, our formulation of the depth-based tree complexity

Ω

given by Equation (4) accounts for that by scaling the tree depth in relation to the number of segments—this metric can be interpreted as tree completeness level.

Figure A1. Fidelity of a surrogate

L

(y-axis) built for the ImageNet data set (see Section 5 for information about the experimental setup) and plotted against its complexity

Ω

(x-axis) expressed as the ratio between the depth of the tree and its maximum depth determined by the number of features in the interpretable representation, which is equivalent to the depth of a complete tree (Equation (4)). The complexity of a linear surrogate is fixed and given by the number of features found in the interpretable representation, i.e., 100%. Panels (a–c) depict fidelity measured with the LIME loss (Equation (3)), and Panels (d–f) show the same property calculated with the LIME

TREE

loss (Equation (5)) for different configurations of the top three classes predicted by a black box. Note different scales on the y-axes. The results are shown for three surrogate models: LIME—a linear surrogate fitted to all interpretable features; TREE—a tree surrogate optimized for fidelity and complexity (Algorithm A1 in Appendix A); and TREE—a TREE surrogate post-processed to achieve full fidelity of model-driven explanations (Algorithm A2 in Appendix A).

Figure A1. Fidelity of a surrogate

L

(y-axis) built for the ImageNet data set (see Section 5 for information about the experimental setup) and plotted against its complexity

Ω

(x-axis) expressed as the ratio between the depth of the tree and its maximum depth determined by the number of features in the interpretable representation, which is equivalent to the depth of a complete tree (Equation (4)). The complexity of a linear surrogate is fixed and given by the number of features found in the interpretable representation, i.e., 100%. Panels (a–c) depict fidelity measured with the LIME loss (Equation (3)), and Panels (d–f) show the same property calculated with the LIME

TREE

loss (Equation (5)) for different configurations of the top three classes predicted by a black box. Note different scales on the y-axes. The results are shown for three surrogate models: LIME—a linear surrogate fitted to all interpretable features; TREE—a tree surrogate optimized for fidelity and complexity (Algorithm A1 in Appendix A); and TREE—a TREE surrogate post-processed to achieve full fidelity of model-driven explanations (Algorithm A2 in Appendix A).

Figure A2. Fidelity of a surrogate

L

(y-axis) built for the CIFAR-10 data set (see Section 5 for information about the experimental setup) and plotted against its complexity

Ω

(x-axis) expressed as the ratio between the depth of the tree and its maximum depth determined by the number of features in the interpretable representation, which is equivalent to the depth of a complete tree (Equation (4)). The caption of Figure A1 provides further information about the details of the plot.

Figure A2. Fidelity of a surrogate

L

(y-axis) built for the CIFAR-10 data set (see Section 5 for information about the experimental setup) and plotted against its complexity

Ω

(x-axis) expressed as the ratio between the depth of the tree and its maximum depth determined by the number of features in the interpretable representation, which is equivalent to the depth of a complete tree (Equation (4)). The caption of Figure A1 provides further information about the details of the plot.

Figure A3. Fidelity of a surrogate

L

(y-axis) built for the CIFAR-100 data set (see Section 5 for information about the experimental setup) and plotted against its complexity

Ω

(x-axis) expressed as the ratio between the depth of the tree and its maximum depth determined by the number of features in the interpretable representation, which is equivalent to the depth of a complete tree (Equation (4)). The caption of Figure A1 provides further information about the details of the plot.

Figure A3. Fidelity of a surrogate

L

(y-axis) built for the CIFAR-100 data set (see Section 5 for information about the experimental setup) and plotted against its complexity

Ω

(x-axis) expressed as the ratio between the depth of the tree and its maximum depth determined by the number of features in the interpretable representation, which is equivalent to the depth of a complete tree (Equation (4)). The caption of Figure A1 provides further information about the details of the plot.

Figure A4. Fidelity of a surrogate

L

(y-axis) built for the Wine data set (see Section 5 for information about the experimental setup) and plotted against its complexity

Ω

(x-axis) expressed as the ratio between the depth of the tree and its maximum depth determined by the number of features in the interpretable representation, which is equivalent to the depth of a complete tree (Equation (4)). The caption of Figure A1 provides further information about the details of the plot.

Figure A4. Fidelity of a surrogate

L

(y-axis) built for the Wine data set (see Section 5 for information about the experimental setup) and plotted against its complexity

Ω

(x-axis) expressed as the ratio between the depth of the tree and its maximum depth determined by the number of features in the interpretable representation, which is equivalent to the depth of a complete tree (Equation (4)). The caption of Figure A1 provides further information about the details of the plot.

Figure A5. Fidelity of a surrogate

L

(y-axis) built for the Forest Covertypes data set (see Section 5 for information about the experimental setup) and plotted against its complexity

Ω

(x-axis) expressed as the ratio between the depth of the tree and its maximum depth determined by the number of features in the interpretable representation, which is equivalent to the depth of a complete tree (Equation (4)). The caption of Figure A1 provides further information about the details of the plot.

Figure A5. Fidelity of a surrogate

L

(y-axis) built for the Forest Covertypes data set (see Section 5 for information about the experimental setup) and plotted against its complexity

Ω

(x-axis) expressed as the ratio between the depth of the tree and its maximum depth determined by the number of features in the interpretable representation, which is equivalent to the depth of a complete tree (Equation (4)). The caption of Figure A1 provides further information about the details of the plot.

Appendix D. Examples of Diverse Explanation Types

Building upon Section 3, Section 5.1 and Section 6, this appendix offers a further discussion of various aspects of the six explanation types available for LIME

TREE

using a concrete case study. To better communicate our method’s explanatory power, we provide multiple examples for the top three classes predicted by a black box for the image shown in Figure 1a—tennis ball (99.28%), golden retriever (0.67%) and Labrador retriever (0.04%)—and compare them with the corresponding LIME explanations shown in Figure 1.

The LIME explanation for tennis ball—shown in Figure 1b—indicates that segment #8, which depicts the ball, has an overwhelmingly positive influence on predicting this class. Figure 1b also exemplifies the problem with a high correlation of adjacent super-pixels: the next two most important segments are #2 and #7—they are neighbors of #8 and mostly surround it—which is likely because they include pixels that belong to the tennis ball object, e.g., the characteristic white stripe (#2). The other two LIME explanations are for golden retriever—Figure 1d—and Labrador retriever—Figure 1c. In both cases, the segment depicting the ball (#8) has a large negative influence, which is expected, and the segment capturing the dog’s face (#3) has a large positive effect. Predicting between these two dog breeds is determined by the positive effect of segment #1 on the golden retriever class (maybe because it reveals the long coat) and the negative influence of segment #2 on the Labrador retriever class (possibly since it includes the white stripe of the tennis ball). Based on this evidence alone, it is difficult to determine the model’s heuristic for telling apart these two classes; in particular, the role that segment #2 plays.

When it comes to LIME

TREE

, we can easily calculate the importance of interpretable features (Gini importance [36])—shown in Figure 6a—which closely resembles LIME insights. Since LIME

TREE

models all three classes simultaneously, the importance captures the segments that help to differentiate between these classes. Comparing Figure 6a with analogous LIME explanations shown in Figure 1 reveals a reassuring overlap, with each LIME explanation sharing at least two of its top four most important segments with the LIME

TREE

explanation. The tree-based feature importance indicates that segment #8—depicting the ball—is the most important, owing to the dominant tennis ball prediction (99.28%), and is followed by segments #1, #3 and #6—covering most of the dog. While informative, these insights cannot be explicitly attributed to any individual class and the feature importance values can only be positive, limiting their explanatory power.

Since all LIME

TREE

explanations are consistent—they are derived from the same surrogate tree—with some help of another explanation type, such as the tree structure visualization shown earlier in Figure 2, we can discover the relation between each important feature—Figure 6a—and the three explained classes. It is important to note that these two explanations are derived from different trees since the depth of the surrogate shown in Figure 2 was limited to two for visualization purposes; this also means that we can achieve full fidelity with respect to model-driven explanations (Lemma 1) but not data-driven explanations (Corollary 1). Comparing the two leftmost leaves with the two rightmost ones—the result of the root split on segment #8—tells us that this segment has positive influence on the tennis ball prediction; additionally, when segment #7 is present this prediction is strengthened, nonetheless without it tennis ball is still the most likely prediction. On the other hand, when the ball is absent, i.e., segment #8 is occluded, both dog breeds are almost equally likely with the presence of segment #3 being the deciding factor: it is Labrador retriever if #3 is occluded, and golden retriever if #3 is present (although Labrador retriever is nearly equally likely in this case).

Arriving at these conclusions required us to inspect and reason over the tree structure, which cannot be expected of a lay explainee (as demonstrated by the results of our pilot user study reported in Section 5.3) or when the surrogate tree is large or complex. In such cases, we can use other types of explanations, for example, what-if questions. Since the tree presented in Figure 2 is not complete (see Lemma 1), we use the black-box model instead of the surrogate to evaluate the hypothetical scenarios. Because segment #8, depicting the ball, is the most important factor, we are interested in what if this segment was not there; the new prediction is 97% golden retriever—see Figure 6c. We can also ask for exemplar explanations of the golden retriever and Labrador retriever classes, which are shown in Figure A6.

Figure A6. Two exemplar explanations of (a) a golden retriever and (b) a Labrador retriever generated with LIME

TREE

.

Figure A6. Two exemplar explanations of (a) a golden retriever and (b) a Labrador retriever generated with LIME

TREE

.

In order to take full advantage of LIME

TREE

explanations, we train a complete surrogate tree (see Corollary 1). We use it to retrieve the shortest possible explanation, i.e., with the highest number of occluded segments, of tennis ball. There are three such explanations of length two—shown in Figure 6b and Figure A7—with the following pairs of super-pixels preserved: #7 and #8, #3 and #8, and #1 and #8. We can also generate rule explanations of Labrador retriever based on the root-to-leaf paths found in the tree, selecting the one with the highest probability of this class. The resulting explanation is

x_{1}^{'} = 0 \land x_{2}^{'} = 0 \land x_{3}^{'} = 1 \land x_{4}^{'} = 1 \land x_{5}^{'} = 1 \land x_{6}^{'} = 1 \land x_{7}^{'} = 0 \land x_{8}^{'} = 0

, yielding 98% confidence. Such a representation is not particularly appealing, especially to a lay audience, but we can improve its comprehensibility by transforming it to the visual domain—see Figure 6d.

Figure A7. The two remaining shortest (highest number of occlusions) LIME

TREE

explanations of tennis ball; the other one—where preserving segments #8 and #7 yields tennis ball with 97% confidence—is given in Figure 6b.

Figure A7. The two remaining shortest (highest number of occlusions) LIME

TREE

explanations of tennis ball; the other one—where preserving segments #8 and #7 yields tennis ball with 97% confidence—is given in Figure 6b.

The biggest advantage of LIME

TREE

is its ability to output counterfactual explanations by using any method compatible with (regression) trees [16]. For example, we can ask the following question: “Given the presence of segment #8 (the ball), what would have to change for the image to be classified as golden retriever?” Therefore, we are looking for an image occlusion pattern that preserves the ball segment (#8) and yields golden retriever prediction. LIME

TREE

tells us that by discarding super-pixels #2, #3 and #7—the smallest viable occlusion shown in Figure A8a—the model predicts golden retriever (54%). Since occluding segment #8, i.e., the ball, results in 97% golden retriever—see Figure 6c—another interesting question is: “Had segment #8 not been there, can the model still predict tennis ball?” LIME

TREE

indicates that this can be achieved by occluding segments #1 and #8 as shown in Figure A8b.

Figure A8. Customized counterfactual explanations generated by LIME

TREE

.

Figure A8. Customized counterfactual explanations generated by LIME

TREE

.

References

Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 2019, 1, 206–215. [Google Scholar] [CrossRef] [PubMed]
Sokol, K.; Flach, P. Explainability is in the mind of the beholder: Establishing the foundations of explainable artificial intelligence. arXiv 2021, arXiv:2112.14466. [Google Scholar]
Longo, L.; Brcic, M.; Cabitza, F.; Choi, J.; Confalonieri, R.; Del Ser, J.; Guidotti, R.; Hayashi, Y.; Herrera, F.; Holzinger, A.; et al. Explainable artificial intelligence (XAI) 2.0: A manifesto of open challenges and interdisciplinary research directions. Inf. Fusion 2024, 106, 102301. [Google Scholar] [CrossRef]
Guidotti, R.; Monreale, A.; Ruggieri, S.; Turini, F.; Giannotti, F.; Pedreschi, D. A survey of methods for explaining black box models. ACM Comput. Surv. (CSUR) 2018, 51, 1–42. [Google Scholar] [CrossRef]
Miller, T. Explanation in artificial intelligence: Insights from the social sciences. Artif. Intell. 2019, 267, 1–38. [Google Scholar] [CrossRef]
Wachter, S.; Mittelstadt, B.; Russell, C. Counterfactual explanations without opening the black box: Automated decisions and the GPDR. Harv. J. Law Technol. 2017, 31, 841. [Google Scholar] [CrossRef]
Poyiadzi, R.; Sokol, K.; Santos-Rodriguez, R.; De Bie, T.; Flach, P. FACE: Feasible and actionable counterfactual explanations. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, New York, NY, USA, 7–8 February 2020; pp. 344–350. [Google Scholar]
Romashov, P.; Gjoreski, M.; Sokol, K.; Martinez, M.V.; Langheinrich, M. BayCon: Model-agnostic Bayesian counterfactual generator. In Proceedings of the IJCAI, Vienna, Austria, 23–29 July 2022; pp. 740–746. [Google Scholar]
Waa, J.v.d.; Robeer, M.; Diggelen, J.v.; Brinkhuis, M.; Neerincx, M. Contrastive explanations with local foil trees. In Proceedings of the 2018 ICML Workshop on Human Interpretability in Machine Learning (WHI 2018), Stockholm, Sweden, 14 July 2018. [Google Scholar]
Verma, S.; Boonsanong, V.; Hoang, M.; Hines, K.; Dickerson, J.; Shah, C. Counterfactual explanations and algorithmic recourses for machine learning: A review. ACM Comput. Surv. (CSUR) 2024, 56, 1–42. [Google Scholar] [CrossRef]
Byrne, R.M. Good explanations in explainable artificial intelligence (XAI): Evidence from human explanatory reasoning. In Proceedings of the IJCAI, Macao, China, 19–25 August 2023; pp. 6536–6544. [Google Scholar]
Miller, T. Explainable AI is dead, long live explainable AI! Hypothesis-driven decision support using evaluative AI. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, Chicago, IL, USA, 12–15 June 2023; pp. 333–342. [Google Scholar]
Weld, D.S.; Bansal, G. The challenge of crafting intelligible intelligence. Commun. ACM 2019, 62, 70–79. [Google Scholar] [CrossRef]
Craven, M.; Shavlik, J.W. Extracting tree-structured representations of trained networks. In Proceedings of the Advances in Neural Information Processing Systems, Denver, CO, USA, 27–30 November 1995; pp. 24–30. [Google Scholar]
Breiman, L.; Friedman, J.; Stone, C.J.; Olshen, R.A. Classification and Regression Trees; Chapman and Hall/CRC: New York, NY, USA, 1984. [Google Scholar]
Sokol, K. Towards Intelligible and Robust Surrogate Explainers: A Decision Tree Perspective. Ph.D. Thesis, University of Bristol, Bristol, UK, 2021. [Google Scholar]
Saeed, W.; Omlin, C. Explainable AI (XAI): A systematic meta-survey of current challenges and future opportunities. Knowl.-Based Syst. 2023, 263, 110273. [Google Scholar] [CrossRef]
Retzlaff, C.O.; Angerschmid, A.; Saranti, A.; Schneeberger, D.; Roettger, R.; Mueller, H.; Holzinger, A. Post-hoc vs ante-hoc explanations: xAI design guidelines for data scientists. Cogn. Syst. Res. 2024, 86, 101243. [Google Scholar] [CrossRef]
Karimi, A.; Schölkopf, B.; Valera, I. Algorithmic recourse: From counterfactual explanations to interventions. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, Virtual Event, 3–10 March 2021; pp. 353–362. [Google Scholar]
Sokol, K.; Flach, P. Explainability fact sheets: A framework for systematic assessment of explainable approaches. In Proceedings of the 2020 ACM Conference on Fairness, Accountability, and Transparency, Barcelona, Spain, 27–30 January 2020; pp. 56–67. [Google Scholar]
Guidotti, R. Counterfactual explanations and how to find them: Literature review and benchmarking. Data Min. Knowl. Discov. 2024, 38, 2770–2824. [Google Scholar] [CrossRef]
Meske, C.; Bunde, E.; Schneider, J.; Gersch, M. Explainable artificial intelligence: Objectives, stakeholders, and future research opportunities. Inf. Syst. Manag. 2022, 39, 53–63. [Google Scholar] [CrossRef]
Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why should I trust you?”: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 1135–1144. [Google Scholar]
Sokol, K.; Hepburn, A.; Santos-Rodriguez, R.; Flach, P. bLIMEy: Surrogate prediction explanations beyond LIME. In Proceedings of the 2019 Workshop on Human-Centric Machine Learning (HCML 2019) at the 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, BC, Canada, 13 December 2019. [Google Scholar]
Sokol, K.; Flach, P. Interpretable representations in explainable AI: From theory to practice. Data Min. Knowl. Discov. 2024, 38, 1–39. [Google Scholar] [CrossRef]
Sokol, K.; Hepburn, A.; Santos-Rodriguez, R.; Flach, P. What and how of machine learning transparency: Building bespoke explainability tools with interoperable algorithmic components. J. Open Source Educ. 2022, 5, 175. [Google Scholar] [CrossRef]
Carlevaro, A.; Lenatti, M.; Paglialonga, A.; Mongelli, M. Multi-class counterfactual explanations using support vector data description. IEEE Trans. Artif. Intell. 2023, 5, 3046–3056. [Google Scholar] [CrossRef]
Hastie, T.; Tibshirani, R. Generalized additive models. Stat. Sci. 1986, 1, 297–310. [Google Scholar] [CrossRef]
Lou, Y.; Caruana, R.; Gehrke, J. Intelligible models for classification and regression. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Beijing, China, 12–16 August 2012; pp. 150–158. [Google Scholar]
Zhang, X.; Tan, S.; Koch, P.; Lou, Y.; Chajewska, U.; Caruana, R. Axiomatic interpretability for multiclass additive models. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 226–234. [Google Scholar]
Shi, S.; Zhang, X.; Li, H.; Fan, W. Explaining the predictions of any image classifier via decision trees. arXiv 2019, arXiv:1911.01058. [Google Scholar]
Tolomei, G.; Silvestri, F.; Haines, A.; Lalmas, M. Interpretable predictions of tree-based ensembles via actionable feature tweaking. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, 13–17 August 2017; pp. 465–474. [Google Scholar]
Sokol, K.; Flach, P.A. Glass-Box: Explaining AI decisions with counterfactual statements through conversation with a voice-enabled virtual assistant. In Proceedings of the IJCAI, Stockholm, Sweden, 13–19 July 2018; pp. 5868–5870. [Google Scholar]
Sokol, K.; Flach, P. One explanation does not fit all: The promise of interactive explanations for machine learning transparency. KI-KÜNstliche Intell. 2020, 34, 235–250. [Google Scholar] [CrossRef]
Flach, P. Machine Learning: The Art and Science of Algorithms That Make Sense of Data; Cambridge University Press: Cambridge, UK, 2012. [Google Scholar]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Laugel, T.; Renard, X.; Lesot, M.J.; Marsala, C.; Detyniecki, M. Defining locality for surrogates in post-hoc interpretablity. In Proceedings of the 2018 ICML Workshop on Human Interpretability in Machine Learning (WHI 2018), Stockholm, Sweden, 14 July 2018. [Google Scholar]
Zhang, Y.; Song, K.; Sun, Y.; Tan, S.; Udell, M. “Why should you trust my explanation?” Understanding uncertainty in LIME explanations. In Proceedings of the AI for Social Good Workshop at the 36th International Conference on Machine Learning (ICML 2019), Long Beach, CA, USA, 9–15 June 2019. [Google Scholar]
Doshi-Velez, F.; Kim, B. Considerations for evaluation and generalization in interpretable machine learning. In Explainable and Interpretable Models in Computer Vision and Machine Learning; Springer: Cham, Switzerland, 2018; pp. 3–17. [Google Scholar]
Sokol, K.; Vogt, J.E. What does evaluation of explainable artificial intelligence actually tell us? A case for compositional and contextual validation of XAI building blocks. In Proceedings of the Extended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems, Honolulu, HI, USA, 11–16 May 2024; pp. 1–8. [Google Scholar]
Mittelstadt, B.; Russell, C.; Wachter, S. Explaining explanations in AI. In Proceedings of the 2019 ACM Conference on Fairness, Accountability, and Transparency, Atlanta, GA, USA, 29–31 January 2019; pp. 279–288. [Google Scholar]
Sokol, K.; Vogt, J.E. (Un)reasonable allure of ante-hoc interpretability for high-stakes domains: Transparency is necessary but insufficient for comprehensibility. In Proceedings of the 3rd Workshop on Interpretable Machine Learning in Healthcare (IMLH) at 2023 International Conference on Machine Learning (ICML), Honolulu, HI, USA, 28 July 2023. [Google Scholar]
Keane, M.T.; Kenny, E.M.; Delaney, E.; Smyth, B. If only we had better counterfactual explanations: Five key deficits to rectify in the evaluation of counterfactual XAI techniques. In Proceedings of the IJCAI, Virtual Event, 19–26 August 2021; pp. 4466–4474. [Google Scholar]
Sokol, K.; Hüllermeier, E. All you need for counterfactual explainability is principled and reliable estimate of aleatoric and epistemic uncertainty. arXiv 2025, arXiv:2502.17007. [Google Scholar]
Sokol, K.; Santos-Rodriguez, R.; Flach, P. FAT Forensics: A Python toolbox for algorithmic fairness, accountability and transparency. Softw. Impacts 2022, 14, 100406. [Google Scholar] [CrossRef]
Garreau, D.; Luxburg, U. Explaining the explainer: A first theoretical analysis of LIME. In Proceedings of the International Conference on Artificial Intelligence and Statistics, PMLR, Virtual Event, 26–28 August 2020; pp. 1287–1296. [Google Scholar]
Sokol, K.; Hepburn, A.; Poyiadzi, R.; Clifford, M.; Santos-Rodriguez, R.; Flach, P. FAT Forensics: A Python toolbox for implementing and deploying fairness, accountability and transparency algorithms in predictive systems. J. Open Source Softw. 2020, 5, 1904. [Google Scholar] [CrossRef]
Achanta, R.; Shaji, A.; Smith, K.; Lucchi, A.; Fua, P.; Süsstrunk, S. SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 34, 2274–2282. [Google Scholar] [CrossRef]
Zhang, H.; Cisse, M.; Dauphin, Y.N.; Lopez-Paz, D. mixup: Beyond empirical risk minimization. In Proceedings of the International Conference on Learning Representations, Toulon, France, 24–26 April 2017. [Google Scholar]
Chen, Y. PyTorch CIFAR Models. 2021. Available online: https://github.com/chenyaofo/pytorch-cifar-models (accessed on 20 February 2025).
Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. ImageNet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
Krizhevsky, A.; Hinton, G. Learning multiple layers of features from tiny images. Technical Report, University of Toronto, Toronto, ON, Canada, 2009. Available online: https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf (accessed on 20 February 2025).
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
Aeberhard, S.; Forina, M. Wine. UCI Machine Learning Repository. 1991. Available online: https://archive.ics.uci.edu/dataset/109/wine (accessed on 20 February 2025).
Blackard, J. Forest Covertypes. UCI Machine Learning Repository. 1998. Available online: https://archive.ics.uci.edu/dataset/31/covertype (accessed on 20 February 2025).
Small, E.; Xuan, Y.; Hettiachchi, D.; Sokol, K. Helpful, misleading or confusing: How humans perceive fundamental building blocks of artificial intelligence explanations. In Proceedings of the ACM CHI 2023 Workshop on Human-Centered Explainable AI (HCXAI), Hamburg, Germany, 28–29 April 2023. [Google Scholar]
Xuan, Y.; Small, E.; Sokol, K.; Hettiachchi, D.; Sanderson, M. Comprehension is a double-edged sword: Over-interpreting unspecified information in intelligible machine learning explanations. Int. J. Hum.-Comput. Stud. 2025, 193, 103376. [Google Scholar] [CrossRef]

Figure 1. LIME explanations for the top three classes predicted by a black-box model. Panel (a) shows the super-pixel interpretable representation of the explained image with

d = 8

segments. Panels (b–d) are LIME explanations; they capture the positive or negative influence of (the presence of) interpretable features on the prediction (probability) of a selected class.

Figure 1. LIME explanations for the top three classes predicted by a black-box model. Panel (a) shows the super-pixel interpretable representation of the explained image with

d = 8

segments. Panels (b–d) are LIME explanations; they capture the positive or negative influence of (the presence of) interpretable features on the prediction (probability) of a selected class.

Figure 2. Surrogate multi-output binary regression tree explaining the top three classes—tennis ball, golden retriever and Labrador retriever—predicted by a black box for the image shown in Figure 1a. The segments marked in blue do not influence the explanation at a given tree node, i.e., they can either be preserved or discarded for the explanation to hold. Super-pixels whose value in the interpretable representation is 1 are preserved and those with 0 are “removed” by occluding them with black patches. The class probabilities estimated by each node of the surrogate tree may not sum up to 1 as these values capture a subset of the modeled classes and are a result of numerical regression, hence they should not be treated as probabilities per se.

Figure 3. High-level overview of LIME

TREE

.

Figure 3. High-level overview of LIME

TREE

.

Figure 4. Behavior of the LIME

TREE

loss (fidelity

L

and its standard deviation, y-axis) computed for the top three classes of the (a) CIFAR-100 and (b) Forest Covertypes data sets and plotted against surrogate complexity (

Ω

, x-axis) given as the ratio between the depth of the tree and its maximum depth (complete tree) determined by the number of features of the interpretable domain. We report results for three surrogate variants: LIME, TREE and TREE; the plots are representative of the other data sets used in our experiments (see Appendix C for the remaining figures) and complement the fidelity at fixed tree complexity levels (66%, 75% and 100%) reported in Table 1. LIME complexity is constant and given by the number of features in the interpretable representation, i.e., 100% equivalent.

Figure 4. Behavior of the LIME

TREE

loss (fidelity

L

and its standard deviation, y-axis) computed for the top three classes of the (a) CIFAR-100 and (b) Forest Covertypes data sets and plotted against surrogate complexity (

Ω

, x-axis) given as the ratio between the depth of the tree and its maximum depth (complete tree) determined by the number of features of the interpretable domain. We report results for three surrogate variants: LIME, TREE and TREE; the plots are representative of the other data sets used in our experiments (see Appendix C for the remaining figures) and complement the fidelity at fixed tree complexity levels (66%, 75% and 100%) reported in Table 1. LIME complexity is constant and given by the number of features in the interpretable representation, i.e., 100% equivalent.

Figure 5. High-level overview of the user study flow.

Figure 6. Examples of four LIME

TREE

explanation types complementing the tree structure visualization shown in Figure 2: (a) feature importance, (b) exemplar, (c) what-if/counterfactual and (d) decision rule. These insights allow to uncover the heuristic used by the black box to differentiate between the three explained classes, which is not feasible with the LIME explanations displayed in Figure 1. Panels (b–d) show explanations generated to maximize the predicted probability of one of the classes; they are presented here with appealing visualizations, but they can also be communicated via the underlying logical expressions, e.g.,

x_{1}^{'} = 0 \land x_{2}^{'} = 0 \land x_{3}^{'} = 1 \land x_{4}^{'} = 1 \land x_{5}^{'} = 1 \land x_{6}^{'} = 1 \land x_{7}^{'} = 0 \land x_{8}^{'} = 0

for Panel (d). Note that LIME

TREE

explanations can be customized to an individual explainee’s needs, which can be seen in Panels (b–d); the user can ask for certain image segments (i.e., interpretable features) to be preserved and other discarded as well as for the smallest or biggest possible occlusion, at the same time requesting to maximize the probability of a selected class (according to the black box).

Figure 6. Examples of four LIME

TREE

explanation types complementing the tree structure visualization shown in Figure 2: (a) feature importance, (b) exemplar, (c) what-if/counterfactual and (d) decision rule. These insights allow to uncover the heuristic used by the black box to differentiate between the three explained classes, which is not feasible with the LIME explanations displayed in Figure 1. Panels (b–d) show explanations generated to maximize the predicted probability of one of the classes; they are presented here with appealing visualizations, but they can also be communicated via the underlying logical expressions, e.g.,

x_{1}^{'} = 0 \land x_{2}^{'} = 0 \land x_{3}^{'} = 1 \land x_{4}^{'} = 1 \land x_{5}^{'} = 1 \land x_{6}^{'} = 1 \land x_{7}^{'} = 0 \land x_{8}^{'} = 0

for Panel (d). Note that LIME

TREE

explanations can be customized to an individual explainee’s needs, which can be seen in Panels (b–d); the user can ask for certain image segments (i.e., interpretable features) to be preserved and other discarded as well as for the smallest or biggest possible occlusion, at the same time requesting to maximize the probability of a selected class (according to the black box).

Table 1. Fidelity loss (mean ± standard deviation, smaller is better, best results in bold) computed: (n-th top) separately for each of the top three black-box predictions with the LIME loss (Equation (3)); and (top n) collectively for the top one, two and three black-box predictions with the LIME

TREE

loss (Equation (5)). We report results for four surrogates: LIME and three variants of LIME

TREE

(TREE, TREE and TREE^†). The percentage shown after the explainer name specifies the tree complexity

Ω

—i.e., its depth divided by its maximum possible depth determined by the number of features in the interpretable representation—at which loss is computed; TREE^† is equivalent to TREE@100% (and TREE@100% for deterministic IRs). We experiment with (top table) three image data sets, using various pretrained neural networks [50]: ImageNet [51] (1659 samples, 256 × 256 pixels, 1000 classes) + Inception v3 (77% accuracy); CIFAR-10 [52] (9714 samples, 32 × 32 pixels, 10 classes) + ResNet 56 (94% accuracy); and CIFAR-100 [52] (9665 samples, 32 × 32 pixels, 100 classes) + RepVGG (77% accuracy). We use all validation set images for which an interpretable representation can be built; however, for ImageNet we first preselect images that are square and at least 256 × 256, which we resize to these dimensions. These results are scaled up by 10². We also experiment with (bottom table) two tabular data sets, training the models ourselves with scikit-learn [53]: Wine [54] (36 samples, 13 features, 3 classes) + Logistic Regression (93% balanced accuracy); and Forest Covertypes [55] (2500 samples, 54 features, 7 classes) + Multi-layer Perceptron (86% balanced accuracy). For Wine, we use all the test set samples; given their small number, we repeated the study on the entire data set (178 samples) with comparable results. The Forest Covertypes test set has 116,203 samples, from which we draw a stratified subset of size 2500. These results are scaled up by 10¹. See Figure 4 for examples of the loss behavior.

Table 1. Fidelity loss (mean ± standard deviation, smaller is better, best results in bold) computed: (n-th top) separately for each of the top three black-box predictions with the LIME loss (Equation (3)); and (top n) collectively for the top one, two and three black-box predictions with the LIME

TREE

loss (Equation (5)). We report results for four surrogates: LIME and three variants of LIME

TREE

(TREE, TREE and TREE^†). The percentage shown after the explainer name specifies the tree complexity

Ω

—i.e., its depth divided by its maximum possible depth determined by the number of features in the interpretable representation—at which loss is computed; TREE^† is equivalent to TREE@100% (and TREE@100% for deterministic IRs). We experiment with (top table) three image data sets, using various pretrained neural networks [50]: ImageNet [51] (1659 samples, 256 × 256 pixels, 1000 classes) + Inception v3 (77% accuracy); CIFAR-10 [52] (9714 samples, 32 × 32 pixels, 10 classes) + ResNet 56 (94% accuracy); and CIFAR-100 [52] (9665 samples, 32 × 32 pixels, 100 classes) + RepVGG (77% accuracy). We use all validation set images for which an interpretable representation can be built; however, for ImageNet we first preselect images that are square and at least 256 × 256, which we resize to these dimensions. These results are scaled up by 10². We also experiment with (bottom table) two tabular data sets, training the models ourselves with scikit-learn [53]: Wine [54] (36 samples, 13 features, 3 classes) + Logistic Regression (93% balanced accuracy); and Forest Covertypes [55] (2500 samples, 54 features, 7 classes) + Multi-layer Perceptron (86% balanced accuracy). For Wine, we use all the test set samples; given their small number, we repeated the study on the entire data set (178 samples) with comparable results. The Forest Covertypes test set has 116,203 samples, from which we draw a stratified subset of size 2500. These results are scaled up by 10¹. See Figure 4 for examples of the loss behavior.

$\times 10^{- 2}$		ImageNet + Inception v3				CIFAR-10 + ResNet 56				CIFAR-100 + RepVGG
$\times 10^{- 2}$		LIME	TREE@66%	TREE@75%	TREE^†	LIME	TREE@66%	TREE@75%	TREE^†	LIME	TREE@66%	TREE@75%	TREE^†
n-th top	1st	3.67 ± 2.18	0.60 ± 0.61	0.64 ± 0.73	0 ± 0	7.34 ± 2.96	2.17 ± 1.25	2.77 ± 1.66	0 ± 0	3.33 ± 1.80	0.59 ± 0.56	0.66 ± 0.63	0 ± 0
	2nd	1.14 ± 1.77	0.24 ± 0.42	0.25 ± 0.40	0 ± 0	3.91 ± 3.98	1.28 ± 1.31	1.69 ± 1.76	0 ± 0	0.97 ± 1.46	0.24 ± 0.36	0.26 ± 0.40	0 ± 0
	3rd	0.63 ± 1.36	0.13 ± 0.25	0.16 ± 0.33	0 ± 0	2.57 ± 3.37	0.89 ± 1.15	1.10 ± 1.44	0 ± 0	0.56 ± 1.13	0.14 ± 0.29	0.16 ± 0.32	0 ± 0
top n	1	3.67 ± 2.18	0.60 ± 0.61	0.64 ± 0.73	0 ± 0	7.34 ± 2.96	2.17 ± 1.25	2.77 ± 1.66	0 ± 0	3.33 ± 1.80	0.59 ± 0.56	0.66 ± 0.63	0 ± 0
	2	2.41 ± 1.40	0.42 ± 0.42	0.44 ± 0.45	0 ± 0	5.63 ± 2.69	1.73 ± 1.03	2.23 ± 1.42	0 ± 0	2.15 ± 1.15	0.41 ± 0.36	0.46 ± 0.40	0 ± 0
	3	2.72 ± 1.58	0.48 ± 0.47	0.53 ± 0.50	0 ± 0	6.91 ± 3.26	2.17 ± 1.28	2.78 ± 1.73	0 ± 0	2.42 ± 1.29	0.48 ± 0.41	0.54 ± 0.45	0 ± 0
$\times 10^{- 1}$		Wine + Logistic Regression				Forest Covertypes + Multi-layer Perceptron
$\times 10^{- 1}$		LIME	TREE@66%	TREE@100%	TREE^†	LIME	TREE@66%	TREE@100%	TREE^†
n-th top	1st	0.29 ± 0.27	0.08 ± 0.11	5.54 ± 3.43	0.07 ± 0.11	0.59 ± 0.26	0.06 ± 0.06	4.56 ± 2.12	0.06 ± 0.06
	2nd	0.14 ± 0.16	0.03 ± 0.04	2.35 ± 3.26	0.03 ± 0.04	0.51 ± 0.29	0.05 ± 0.05	1.88 ± 1.21	0.05 ± 0.05
	3rd	0.20 ± 0.28	0.07 ± 0.12	3.73 ± 4.18	0.06 ± 0.11	0.13 ± 0.21	0.02 ± 0.04	0.57 ± 0.94	0.02 ± 0.04
top n	1	0.29 ± 0.27	0.08 ± 0.11	5.54 ± 3.43	0.07 ± 0.11	0.59 ± 0.26	0.06 ± 0.06	4.56 ± 2.12	0.06 ± 0.06
	2	0.22 ± 0.19	0.06 ± 0.07	3.94 ± 2.67	0.05 ± 0.07	0.55 ± 0.26	0.06 ± 0.05	3.22 ± 1.04	0.06 ± 0.05
	3	0.32 ± 0.29	0.09 ± 0.12	5.80 ± 3.56	0.08 ± 0.12	0.62 ± 0.29	0.07 ± 0.06	3.51 ± 1.09	0.07 ± 0.06

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Sokol, K.; Flach, P. LIME $TREE$ : Consistent and Faithful Surrogate Explanations of Multiple Classes. Electronics 2025, 14, 929. https://doi.org/10.3390/electronics14050929

AMA Style

Sokol K, Flach P. LIME $TREE$ : Consistent and Faithful Surrogate Explanations of Multiple Classes. Electronics. 2025; 14(5):929. https://doi.org/10.3390/electronics14050929

Chicago/Turabian Style

Sokol, Kacper, and Peter Flach. 2025. "LIME $TREE$ : Consistent and Faithful Surrogate Explanations of Multiple Classes" Electronics 14, no. 5: 929. https://doi.org/10.3390/electronics14050929

APA Style

Sokol, K., & Flach, P. (2025). LIME $TREE$ : Consistent and Faithful Surrogate Explanations of Multiple Classes. Electronics, 14(5), 929. https://doi.org/10.3390/electronics14050929

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

LIME $TREE$ : Consistent and Faithful Surrogate Explanations of Multiple Classes

Abstract

1. Introduction

2. Related Work and Background

3. LIME $TREE$

4. Fidelity Guarantees

5. Qualitative, Quantitative and User-Based Evaluation

5.1. Desiderata

5.2. Synthetic Experiments

5.3. Pilot User Study

6. Discussion

7. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A. LIME $TREE$ Algorithms

Appendix B. Proofs

Appendix C. Loss Behavior

Appendix D. Examples of Diverse Explanation Types

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

LIMETREE: Consistent and Faithful Surrogate Explanations of Multiple Classes

Abstract

1. Introduction

2. Related Work and Background

3. LIME TREE

4. Fidelity Guarantees

5. Qualitative, Quantitative and User-Based Evaluation

5.1. Desiderata

5.2. Synthetic Experiments

5.3. Pilot User Study

6. Discussion

7. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A. LIME TREE Algorithms

Appendix B. Proofs

Appendix C. Loss Behavior

Appendix D. Examples of Diverse Explanation Types

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

LIME $TREE$ : Consistent and Faithful Surrogate Explanations of Multiple Classes

3. LIME $TREE$

Appendix A. LIME $TREE$ Algorithms