Optimizing Local Explainability in Robotic Grasp Failure Prediction

Acun, Cagla; Ashary, Ali; Popa, Dan O.; Nasraoui, Olfa

doi:10.3390/electronics14122363

Open AccessEditor’s ChoiceArticle

Optimizing Local Explainability in Robotic Grasp Failure Prediction

¹

Knowledge Discovery and Web Mining Lab, Department of Computer Science and Engineering, University of Louisville, Louisville, KY 40292, USA

²

Louisville Advanced Automation and Robotics Research Institute (LARRI), Department of Electrical and Computer Engineering, Speed School of Engineering, University of Louisville, Louisville, KY 40292, USA

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(12), 2363; https://doi.org/10.3390/electronics14122363

Submission received: 7 May 2025 / Revised: 2 June 2025 / Accepted: 6 June 2025 / Published: 9 June 2025

(This article belongs to the Special Issue Robotics: From Technologies to Applications)

Download

Browse Figures

Versions Notes

Abstract

This paper presents a local explainability mechanism for robotic grasp failure prediction that enhances machine learning transparency at the instance level. Building upon pre hoc explainability concepts, we develop a neighborhood-based optimization approach that leverages the Jensen–Shannon divergence to ensure fidelity between predictor and explainer models at a local level. Unlike traditional post hoc methods such as LIME, our local in-training explainability framework directly optimizes the predictor model during training, then fine-tunes the pre-trained explainer for each test instance within its local neighborhood. Experiments with Shadow’s Smart Grasping System demonstrate that our approach maintains black-box-level prediction accuracy while providing faithful local explanations with significantly improved point fidelity, neighborhood fidelity, and stability compared to LIME. In addition, our approach addresses the critical need for transparent and reliable grasp failure prediction systems by providing explanations consistent with the model’s local behavior, thereby enhancing trust in autonomous robotic grasping systems. Our analysis also shows that the proposed framework generates explanations more efficiently, requiring substantially less computational time than post hoc methods. Through a detailed examination of neighborhood size effects and explanation quality, we further demonstrate how users can select appropriate local neighborhoods to balance explanation quality and computational cost.

Keywords:

explainable artificial intelligence; XAI; local explainability; pre hoc explainability; robotic grasp failure prediction; neighborhood-based explanation

1. Introduction

The integration of artificial intelligence into robotic systems has revolutionized the way industries approach decision making, automation, and operational efficiency [1]. However, this integration has also raised significant challenges in terms of the trustworthiness and interpretability of the underlying algorithms [2,3]. In particular, autonomous robotic systems, which apply artificial intelligence (AI) and machine learning in uncertain physical environments, often operate as “black boxes” with decision-making processes and failure modes that may not be transparent or easily understood by human operators. This lack of transparency and interpretability hinders the adoption and reliability of intelligent systems in critical applications, where understanding the rationale behind decisions is crucial [4,5,6]. Therefore, providing clear explanations for such complex models is a significant aspect of increasing trust in machine learning (ML) models of operational success or failure [7,8]. For all these reasons, explainable AI techniques have emerged as a promising solution to address the lack of transparency and interpretability in autonomous systems [9]. These techniques aim to bridge the gap between complex algorithms and human understanding [10,11,12]. However, most existing explainability methods are post hoc, meaning that they attempt to explain the decisions of a model after the model has been trained. Hence, post hoc methods can lead to explanations that are inconsistent with the model’s actual decision-making process and may fail to capture the true behavior of the model [13,14].

Robot grasping is a fundamental task in robotics that involves complex interactions between the robot, its environment, and the object it is grasping. One critical aspect of tasks such as robot grasping is fault diagnostics, which refers to the process of identifying and diagnosing problems or failures within a system [15]. ML-based approaches offer the potential to overcome traditional diagnostic limitations by using large datasets to learn complex patterns and relationships inherent in the grasping process [16]. For instance, DexNet is an ML framework under continued development for identifying stable grasp poses from visual information using Convolutional Neural Networks from synthetic or experimental point clouds [17]. Another related approach [18] uses deep reinforcement learning methods in robotic grasping through visio-motor feedback. In our previous work on comparative analysis of post hoc explanations [19], we explored the balance between accuracy and interpretability in predicting robot grasp failure by explaining black-box models with post hoc explanation generation methods, such as Shapley Additive Explanations (SHAP) [20] and LIME [21]. We subsequently introduced the concept of pre hoc explainability [22], which incorporates explanations during the training phase to ensure that the model’s predictions are inherently aligned with its explanations. Our pre hoc framework optimizes the predictor model during training to make predictions that are faithful to explanations provided by an interpretable white-box model.

While our previous work [23] focused on global explanations for robotic grasping prediction that provide an overall interpretation of a model’s behavior, local explanations can offer better insights into the model’s decision-making process for individual instances. Local explainability is particularly important in domains such as robotic grasping, where the consequences of individual predictions can be significant, and knowing the factors that affect a specific decision can be necessary. To address these challenges, this paper presents a novel approach that leverages a local pre hoc explainability framework [24], aiming to enhance the transparency and interpretability of grasp failure prediction, enabling the generation of instance-specific explanations for robot grasp failures by leveraging the Jensen–Shannon divergence and neighborhood information to generate local explanations faithful to the model’s behavior. Unlike post hoc methods, our approach does not rely on input perturbation or post-secondary model learning, thus avoiding the potential pitfalls of surrogate modeling.

Our framework works in two phases. In Phase 1, the training phase, a white-box (global explainer model) is trained on the entire training data to capture the relationship between the input features and the target variable. Then, a more powerful and complex black-box model is trained to optimize accuracy while being regularized to align its decision boundary with the explanations of the white-box model as quantified by fidelity within the neighboring training instances. Hence, the black-box model is aligned with the explainer model locally. Later, in Phase 2, which is the testing or inference phase, we fine-tune the global explainer model to learn a local explainer model using only a subset of testing instances chosen from a local neighborhood around the testing instance.

Research Questions and Contributions

To comprehensively evaluate our approach, we answer the following research questions (RQs).

RQ1: Explanatory Power: How well does the explainer model mimic the predictor model in Phase 1? This question examines the global explanatory power of our pre hoc framework by assessing how faithfully the explanations generated by the framework capture the nuances of the black-box model’s decision-making process.
RQ2: Trade-off between Accuracy and Explanation Fidelity: How does explainability regularization $λ$ affect the accuracy and fidelity score in Phase 1? This question investigates the balance between prediction performance and explanation quality, analyzing how varying the explainability regularization parameter influences both aspects.
RQ3: Locality: How well do the explanations capture the local behavior of the model compared to LIME in Phase 2? This question evaluates the effectiveness of our local explainability approach compared to traditional post hoc methods in terms of point fidelity, neighborhood fidelity, and stability.
RQ4: Neighborhood Size: How does the neighborhood size affect the neighborhood fidelity, stability, and computational cost in Phase 2? This question explores the impact of different neighborhood sizes on the quality and efficiency of local explanations, providing insights for selecting appropriate local neighborhoods for robotic grasping applications.

By answering the above research questions, we aim to demonstrate that our approach not only produces more faithful explanations than post hoc methods but also maintains high prediction accuracy while offering insights at both global and local levels. We also show that our framework balances the trade-off between explainability and performance, which is particularly important in robotics applications where both accuracy and interpretability are crucial.

To summarize, this paper makes the following contributions.

We incorporate local explainability, enabling the generation of instance-specific explanations for robotic grasp failures. The neighborhood-based approach to local explainability takes advantage of Jensen–Shannon divergence to measure and optimize local fidelity between the predictor and explainer models.
We conduct a case study using a novel two-phase methodology that first trains the predictor model with local explainability constraints, then fine-tunes the white-box explainer model within local neighborhoods to generate instance-specific explanations.
We demonstrate through comprehensive experiments that our local pre hoc explainability framework outperforms traditional post hoc methods like LIME [21] in terms of point fidelity, neighborhood fidelity, stability, and computational efficiency, and analyze the impact of neighborhood size on explanation quality and computational cost, providing insights for selecting appropriate local neighborhoods for robotic grasping applications.

The remainder of this paper is organized as follows. Section 2 provides a comprehensive background on explainable AI, comparing global vs. local explainability approaches, post hoc vs. in-training methods, and detailing factorization machines and our pre hoc framework. Section 3 formalizes the problem of local explainability in robotic grasp failure prediction and presents our two-phase methodology. Section 4 describes our experimental setup, including dataset details, evaluation metrics, and comparison baselines. Section 5 presents comprehensive results that address our four research questions, with detailed analysis of explanatory power, accuracy–fidelity trade-offs, locality performance, and neighborhood size effects. Finally, Section 6 concludes with a summary of contributions and future research directions.

2. Background

2.1. Application Scenarios for Robotic Grasp Failure Prediction

Robotic grasp failure prediction finds critical applications across diverse domains where reliable manipulation is essential [15]. In industrial automation and manufacturing, assembly lines require robots to handle components ranging from delicate electronics to heavy automotive parts, where grasp failures can halt production and damage expensive equipment [16]. Service and assistive robotics applications, including personal care robots and healthcare assistants, must interact safely with humans and everyday objects, where failures pose safety risks and reduce user confidence [25]. Warehouse and logistics automation systems rely on robotic pick-up and packing for e-commerce fulfillment, where grasp failures directly impact operational efficiency. Medical and surgical robotics demand extremely precise manipulation capabilities, where failures can have life-threatening consequences and require explainable predictions to build surgeon confidence and enable system validation [26].

The Critical Role of Explainability: Across all these application domains, explainability serves multiple crucial functions: (1) Trust and Adoption: Human operators need to understand and trust robotic systems before fully integrating them into critical workflows [27]. (2) Debugging and Improvement: Engineers require insights into failure modes to iteratively improve system performance [28,29]. (3) Safety and Risk Assessment: Understanding the conditions that lead to failures enables better risk management and safety protocols [30]. (4) Regulatory Compliance: Many industries require transparent and interpretable automated systems for regulatory approval [31]. These diverse application scenarios demonstrate the urgent need for explainable grasp failure prediction systems that can provide instance-specific insights directly relevant to operational contexts.

2.2. Explainability in Machine Learning

Explainability in machine learning refers to the ability to understand and interpret the decisions made by machine learning models. Explainable AI techniques aim to enhance the transparency and interpretability of these models, thereby increasing trust and enabling effective human–AI interaction [3]. As ML systems become increasingly integrated into critical domains such as healthcare, finance, and robotics, the demand for transparency in automated decision-making has grown significantly [7].

2.2.1. Global vs. Local Explainability

Global explainability provides an overall interpretation of a model’s behavior across the entire dataset, offering insights into the general patterns and relationships learned by the model. It answers questions about which features are most important for the model overall and how these features influence predictions in general terms. In contrast, local explainability focuses on explaining individual predictions, providing instance-specific explanations that highlight the factors that influence a particular decision [32].

While global explanations are useful for understanding the model’s behavior in general, they may not capture the nuances of specific predictions [21]. The relationship between a feature and the prediction might be non-linear or context-dependent, varying significantly across different regions of the feature space. Local explanations address this limitation by providing insights tailored to specific instances [21], which is particularly important in contexts where understanding the factors affecting a particular decision is crucial, such as healthcare, finance, and robotics applications [7].

2.2.2. Post Hoc Explainability

Post hoc explainability methods generate explanations after a model has been trained. These methods are widely used to explain the decisions of complex black-box models, such as deep neural networks and ensemble methods. Popular post hoc explanation techniques include LIME [21] and SHAP [20].

LIME (Local Interpretable Model-agnostic Explanations) approximates a black-box model locally around a specific prediction using a simpler, interpretable model. It perturbs the input and observes the changes in the model’s output to learn how the black-box model behaves in the vicinity of the instance being explained [21]. This approach generates a local surrogate model that is inherently interpretable (typically a linear model), which approximates the behavior of the complex model in the neighborhood of the instance being explained.

SHAP (SHapley Additive exPlanations) attributes the prediction of a specific instance to each of its features based on game theory principles. It calculates the contribution of each feature to the prediction relative to the average prediction for the dataset [20]. SHAP values provide a unified measure of feature importance that integrates several existing explanation methods and possesses desirable properties such as local accuracy, missingness, and consistency. While SHAP can provide local explanations for individual instances, it computes these explanations by considering all possible feature coalitions, which can be computationally expensive for high-dimensional data. The local SHAP explanations are generated post hoc by analyzing the marginal contributions of features around the specific instance, but this process does not influence the original model’s training or decision boundaries.

Other notable local explanation methods include Integrated Gradients [33], which provides local feature attributions by integrating gradients along a path from a baseline to the input, and Anchors [34], which generates rule-based local explanations that describe sufficient conditions for predictions. DeepLIFT [35] and Layer-wise Relevance Propagation (LRP) [36] focus specifically on deep neural networks, propagating relevance scores backward through the network layers to generate local explanations.

Post hoc explanation methods have several fundamental limitations that our approach addresses. They operate entirely outside the scope of the model and its training, potentially leading to explanations that are inconsistent with the model’s actual decision-making process [13]. Recent research has shown that post hoc explanations can even be manipulated to misrepresent the behavior of the model, raising concerns about their reliability for critical applications [14]. Additionally, these methods often involve the generation of surrogate models or perturbing inputs, which can be computationally expensive and may not scale well to large datasets or complex models.

Differences between post hoc method and our approach: Our pre hoc local explainability framework fundamentally differs from current post hoc methods in three key aspects: (1) Training Integration: Explanations are incorporated during the model training phase rather than generated afterward, ensuring inherent alignment between predictions and explanations. (2) Neighborhood Optimization: We explicitly optimize for local fidelity within neighborhoods during training, rather than post hoc approximation of local behavior. (3) Computational Efficiency: Our two-phase approach avoids the need for repeated input perturbations, resulting in more efficient explanation generation while maintaining higher fidelity to the model’s actual decision process.

2.3. Notation and Definitions

Before describing the details of our approach, we present the main definitions and mathematical notation used throughout the paper in Table 1.

2.4. In-Training Explainability

In contrast to post hoc methods, in-training explainability techniques incorporate explanations during the model training phase. Hence, they ensure that the model’s predictions are inherently aligned with its explanations, leading to more faithful and consistent explanations [37]. In-training explainability methods can be categorized into three main approaches:

Self-Explaining Models: These are inherently interpretable models that provide explanations as part of their architecture and training process. Examples include attention mechanisms in neural networks and prototype-based models [38,39,40].
Joint Training: This approach involves training the predictive model and the explanation model simultaneously, often through a multi-task learning framework that optimizes both prediction accuracy and explanation quality [41].
Regularization-Based Approaches: These methods incorporate explainability constraints into the training objective, guiding the model to learn representations that are both predictive and interpretable [42,43,44].

Our previous work introduced the concept of pre hoc explainability, a type of in-training explainability where an interpretable white-box model guides the learning of a black-box model through regularization [22,23]. This approach ensures that the black-box model’s predictions are faithful to the explanations provided by the white-box model, without significantly compromising predictive performance. While our previous work focused on global explainability, this paper extends the pre hoc explainability framework to incorporate local explainability, enabling the generation of instance-specific explanations that capture the model’s behavior in the surroundings of a particular instance.

2.5. Factorization Machines

Factorization machines (FMs) [45] are supervised learning models that can be applied to a wide range of prediction tasks while reliably estimating model parameters in the challenging case of large quantities of sparse data, hence enabling the model to be trained with very few data points. FMs were initially developed to address the challenges of recommendation systems with sparse interaction data, but their flexibility has led to applications in various domains, including robotics.

The key innovation of FMs is their ability to model feature interactions through factorized parameters, which allows them to learn complex patterns even when specific feature combinations are rarely observed in the training data. The model equation for a factorization machine of degree

d = 2

is defined as:

\hat{y} (x) = \underset{Term 1}{\underset{︸}{w_{0}}} + \underset{Term 2}{\underset{︸}{\sum_{i = 1}^{n} w_{i} x_{i}}} + \underset{Term 3}{\underset{︸}{\sum_{i = 1}^{n} \sum_{j = i + 1}^{n} 〈 v_{i}, v_{j} 〉 x_{i} x_{j}}}

(1)

where the model parameters that have to be estimated are

w_{0} \in R, w_{i} \in R^{n}, v_{i} \in R^{n \times k},

(2)

And

〈 \cdot, \cdot 〉

is the dot product of two vectors of size k:

\begin{matrix} 〈 v_{i}, v_{j} 〉 = \sum_{f = 1}^{k} v_{i, f} v_{j, f} \end{matrix}

(3)

A row

v_{i}

within V represents the

i^{t h}

variable with k factors, where k is a hyperparameter that defines the dimensionality of the factorization. The first term in Equation (1) represents global bias, while the second term captures the linear effects of individual features, and the third term models the pairwise feature interactions through factorized parameters. This factorization approach allows FMs to implicitly learn higher-order feature interactions with reduced complexity compared to explicit polynomial models.

In the context of robotic grasp failure prediction, FMs offer several advantages. They can effectively model complex interactions between different sensor measurements (e.g., joint positions, velocities, and efforts) while maintaining computational efficiency. Additionally, the factorized nature of the model provides a basis for generating explanations that capture both direct feature effects and feature interactions, which are crucial for understanding the complex dynamics of robotic grasping.

2.6. Pre Hoc Explainability Framework

The pre hoc explainability framework, as illustrated in Figure 1, which is introduced in our previous work [22], consists of training an interpretable white-box model, which serves as an explainer, and then using this explainer to guide the training of a more complex black-box model through regularization. This approach ensures that the black-box model’s predictions are faithful to the explanations provided by the white-box model, without significantly compromising predictive performance.

The framework follows a multistep process, described below. First, a white-box model (the explainer) is trained to capture the relationship between the input features and the target variable. This model is intentionally constrained to be interpretable, such as a linear model or a shallow decision tree, to ensure that humans can understand its decision-making process. Simultaneously, a black-box model (the predictor) is trained to optimize for prediction accuracy. This model is typically more complex and has a higher capacity to capture intricate patterns in the data, but lacks inherent interpretability. The training of the black-box model is regularized by a fidelity term that measures the alignment between its predictions and those of the white-box model, encouraging the black-box model to learn a decision boundary that is consistent with the interpretable explanation.

The pre hoc explainability framework utilizes the Jensen–Shannon (JS) divergence to measure the alignment between the predictions of the black-box model,

f_{θ}

, and those of the white-box model,

g_{ϕ}

. The JS divergence is a symmetric measure of the similarity between two probability distributions, and it is used in the pre hoc framework to ensure that the black-box model’s predictions are consistent with the white-box model’s explanations.

The objective function used in the pre hoc explainability framework is defined as

min_{f_{θ} \in F} D_{JS} = \frac{1}{N} \sum_{i = 1}^{N} D (f_{θ} (x_{i}), g_{ϕ} (x_{i})),

(4)

where function D is the Jensen–Shannon divergence, which is low when the explainer fidelity is high.

The loss function for training the black-box model in the pre hoc framework is a combination of the prediction loss and the JS divergence:

L_{P r e - h o c} = L_{B C E} + λ_{1} D_{JS} + λ_{2} L_{2},

(5)

where

L_{B C E}

is the binary cross-entropy loss,

λ_{1}

is an explainability regularization coefficient that controls the trade-off between explainability and accuracy, and

λ_{2}

is a coefficient used for

L_{2}

regularization of model parameters

θ

to avoid overfitting.

While the pre hoc explainability framework provides global explanations that capture the overall behavior of the model, it does not address the need for local, instance-specific explanations. In the next section, we extend this framework to incorporate local explainability.

2.7. Jensen–Shannon Divergence

The Jensen–Shannon (JS) divergence serves as the core fidelity measure in our framework. Unlike simple distance metrics, it quantifies how differently two probability distributions assign confidence across outcomes. In our context, it measures whether the black-box predictor

f_{θ}

and white-box explainer

g_{ϕ}^{g l o b a l}

make similar confidence assessments within local neighborhoods. Consider two models predicting grasp success. If both models are highly confident (e.g., 0.9 probability) that a grasp will succeed, their JS divergence is low, indicating good agreement. If one model predicts 0.9 success probability while the other predicts 0.3, the JS divergence is high, suggesting the explainer does not faithfully represent the predictor’s reasoning in that region.

The mathematical formulation builds this intuition formally:

D_{J S} (P | | Q) = \frac{1}{2} D_{K L} (P | | M) + \frac{1}{2} D_{K L} (Q | | M)

(6)

where

M = \frac{1}{2} (P + Q)

is the average distribution, and

D_{K L}

represents the Kullback–Leibler divergence. This symmetric formulation ensures that the fidelity measure treats both models equally, avoiding bias toward either the predictor or explainer.

Local Application: We extend this concept to neighborhoods by computing the JS divergence between the predictor’s and explainer’s probability distributions across all instances within a local neighborhood

N^{t r a i n i n g} (x_{i})

. This captures not only whether the models agree on the central instance but also whether they exhibit consistent reasoning patterns across similar grasping scenarios.

3. Problem Formulation

We focus on a robot’s hand with three fingers, including information about the joints’ position, velocity, effort (torque) of each finger, and stability of the grasp for an object. Our aim is to predict grasp failure from the position, velocity, and effort measurements of each of the three joints in each of the three fingers. These measurements are collected into features that are named after the combination of hand (only Hand 1 is used), finger, joint, and either position, velocity, or effort, as summarized in the following nomenclature.

$H 1$ : Hand 1, indicating the only hand used in the simulation.
$F 1, F 2, F 3$ : Fingers on the hand, where each finger has three joints.
$J 1, J 2, J 3$ : Joints in each finger, with each joint having measurements for position ( $p o s$ ), velocity ( $v e l$ ), and effort ( $e f f o r t$ ).

Hence,

H 1 F j J k

indicates the joint k of Finger j of Hand 1. The dataset for a single experiment

e_{i}

can be represented as a matrix:

M_{e_{i}} = [\begin{matrix} p o s_{H 1 F 1 J 1} & v e l_{H 1 F 1 J 1} & e f f o r t_{H 1 F 1 J 1} \\ ⋮ & ⋮ & ⋮ \\ p o s_{H 1 F 3 J 3} & v e l_{H 1 F 3 J 3} & e f f o r t_{H 1 F 3 J 3} \end{matrix}]

where

p o s_{H 1 F j J k}

,

v e l_{H 1 F j J k}

, and

e f f o r t_{H 1 F j J k}

represent the position, velocity, and effort measurements of joint k in finger j of Hand 1, respectively.

The grasp robustness R for each experiment is computed based on the variation of the distance between the palm and the ball during the shake, as shown in Figure 2, denoted as

R (e_{i}) = f (Δ d i s t_{p a l m - b a l l} (e_{i}))

where f is a function that computes the robustness based on the distance variation

Δ d i s t_{p a l m - b a l l}

during the experiment

e_{i}

. By having all these features as a dataset

Z

, then, let

S = {(x_{i}, y_{i})}_{i = 1}^{N} \subset Z

be a sample from a distribution

D

in a domain

Z = X \times Y

, where

X

is the instance space and

Y

is the label space. We learn a differentiable predictive function

f \in F : X \to Y

together with a transparent explainer function

g \in G : X \to Y

defined over a functional class

G

. We refer to functions

f_{θ}

,

g_{ϕ}^{g l o b a l}

, and

g_{ϕ_{i}}^{l o c a l}

as the predictor, global explainer and local explainer, respectively.

G

is strictly constrained to be an inherently explainable functional set, such as a set of linear functions.

Our goal is to optimize the predictor

f_{θ}

to provide global explanations that are faithful to the model’s behavior and consistent with the explanations generated by the global explainer

g_{ϕ}^{g l o b a l}

. Then, we fine-tune the global explainer

g_{ϕ}^{g l o b a l}

within the local neighborhood,

N^{t e s t i n g} (x_{i})

, of a new testing instance

x_{i}

to obtain the local explainer

g_{ϕ_{i}}^{l o c a l}

. We aim to achieve this by minimizing the divergence between the outputs of the predictor and global explainer in the local neighborhood while maintaining the predictor’s accuracy and generating local explanations from the local explainer model,

g_{ϕ_{i}}^{l o c a l}

.

3.1. Local Explainability Using Neighborhoods

Local explainability refers to understanding and interpreting a model’s predictions at an individual instance level, whereas global explanations provide an overall understanding of the model’s behavior. Thus, local explanations can offer insights into the factors influencing a specific prediction, which is particularly valuable in domains where decisions have significant consequences, such as robotics.

To achieve local explainability, we leverage the concept of neighborhoods. For each instance in the dataset, we identify a set of neighboring instances, typically using a distance metric such as k-nearest neighbors (k-NN). The intuition behind considering local neighborhoods is that similar inputs are expected to have similar outputs, and by focusing on the local neighborhood of an instance, we can capture the model’s behavior near that instance.

This neighborhood-based approach to local explainability has several advantages:

Context Awareness: By considering instances that are similar to the target instance, the explanation reflects the model’s behavior in the specific region of the feature space, accounting for local patterns and interactions that may differ from the global behavior.
Stability: Explanations based on neighborhoods tend to be more stable than those based on individual instances, as they aggregate information from multiple similar data points, reducing the impact of noise or outliers.
Generalizability: The neighborhood-based approach can be applied to various types of models and data, making it a versatile framework for local explainability.

We used the Jensen–Shannon divergence to measure the local fidelity between the predictor model and the explainer model. Specifically, we computed the JS divergence between the predictions of the predictor model and the explainer model within the local neighborhood of each instance. This allows us to quantify how well the explainer captures the predictor’s behavior in the vicinity of specific instances.

The concept of local neighborhoods also aligns well with the nature of robotic grasping, where the system operates in a high-dimensional state space with complex dynamics. By focusing on local explanations, we can provide insights that are directly relevant to specific grasping scenarios, helping robotics engineers understand the factors influencing success or failure in particular contexts.

Next, we will formalize the problem of local explainability in robot grasp failure prediction and detail our proposed approach for extending the pre hoc explainability framework to incorporate local explanations.

We implement a locally explainable machine learning framework, a pre hoc local explainability framework; see Figure 3. The framework consists of 2 phases by integrating local neighborhoods to achieve local explainability. Phase 1 (training for fidelity) is the training phase that optimizes the agreement between the white-box explainer and black-box models, quantified by the fidelity with respect to the neighboring training instances. Phase 2 (generating local explanations) performs fine-tuning of the white-box explainer model within the neighboring training instances closest to a new test instance.

We use a nearest neighborhood algorithm to identify a set of neighboring instances for each local instance in the dataset, such as k-nearest neighbors (k-NN) with Euclidean distance. The intuition behind considering local neighborhoods is that similar inputs are expected to have similar outputs while capturing the model’s behavior near each instance by focusing on the local neighborhood.

We compute the predicted probability distributions for each instance and its corresponding neighbors using the predictor model

f_{θ}

and the explainer model

g_{ϕ}^{g l o b a l}

. This step results in multiple probability distributions for each training instance, representing the predictions of the black-box and white-box explainer models within the local neighborhood. Then, we use the Jensen–Shannon divergence, computed between the predictions of the black-box predictor model and the explainer model for the local in-training neighborhoods surrounding an instance. This divergence measure quantitatively assesses the consistency between the predictor and explainer models at the local level.

3.2. Neighborhood Fidelity Objective Function

Algorithm 1 presents the method for integrating the Jensen–Shannon divergence for local explainability within neighborhoods during the training of the black-box model. The algorithm takes as input the black-box model

f_{θ}

, the white-box model

g_{ϕ}^{g l o b a l}

, the training dataset

X_{t r a i n}

with their true labels y, and the hyperparameter

λ_{1}

. It also requires the nearest neighborhood function

G e t N e i g h b o r s

and the number of neighborhood instances k.

Algorithm 1 Training phase for pre hoc: integrating local explainability with neighbors in training.

Require: input training instances $X_{t r a i n}$ , true label y, nearest neighborhood function $G e t N e i g h b o r s ()$ , number of neighborhood instances k, and parameter $λ$ , the coefficient for the explainability regularization term.
for each $x_{i}$ in $X_{t r a i n}$ do
$N^{t r a i n i n g} (x_{i}) \leftarrow GetNeighbors (x_{i}, k, X_{t r a i n})$ ▹ Get k-NN to training instance $x_{i}$ from Training set
end for
Initialize $ϕ$ and $θ$
for each $x_{i}$ in $X_{t r a i n}$ do
Compute $p^{θ} = f_{θ} (x_{i})$ ▹ Predictions from predictor model
Compute $p^{ϕ} = g_{ϕ}^{g l o b a l} (x_{i})$ ▹ Predictions from explainer model
Compute $L_{J S_{l o c a l}} = L_{J S} (f_{θ} (x_{i}), g_{ϕ}^{g l o b a l} (x_{i}), N^{t r a i n i n g} (x_{i}))$
Compute $L_{g^{g l o b a l} B C E}$ ▹ White-box model loss
$L_{L o c a l - P r e - h o c} = L_{f - B C E} + λ \cdot L_{J S_{l o c a l}}$ ▹ Black-box model loss
Update $f_{θ}$ using gradient descent on $θ$ : $θ \leftarrow θ - α \nabla_{θ} L_{L o c a l - P r e - h o c}$
Update $g_{ϕ}^{g l o b a l}$ using gradient descent on $ϕ$ : $ϕ \leftarrow ϕ - α \nabla_{ϕ} L_{g^{g l o b a l} B C E}$
end for
Return $f_{θ}$ and $g_{ϕ}^{g l o b a l}$
procedure $C o m p u t e L o c a l L o s s_{J S}$ ( $f_{θ}$ , $g_{ϕ}^{g l o b a l}$ , $D$ ) ▹ Compute average JS divergence for input subset $D$
$L_{JS} \leftarrow 0$
for All $x_{i} \in D$ do
${\hat{y}}_{ϕ}^{(i)} \leftarrow g_{ϕ}^{g l o b a l} (x_{i})$
${\hat{y}}_{θ}^{(i)} \leftarrow f_{θ} (x_{i})$
$L_{J S_{l o c a l}}^{(i)} \leftarrow D_{J S_{l o c a l}} ({\hat{y}}_{θ}^{(i)}, {\hat{y}}_{ϕ}^{(i)})$
$L_{J S_{l o c a l}} \leftarrow L_{J S_{l o c a l}} + L_{J S_{l o c a l}}^{(i)}$
end for
$L_{J S_{l o c a l}} \leftarrow \frac{1}{| D |} L_{J S_{l o c a l}}$
return $L_{J S_{l o c a l}}$
end procedure

Given a global explainer model

g_{ϕ}^{g l o b a l}

with parameters

ϕ

, let its predictions result in a probability distribution

p_{ϕ}

. Given the predictor,

f_{θ}

with parameters

θ

, let its predictions result in probability distribution

p_{θ}

over binary classes

y \in Y = {0, 1}

, where 1 represents a stable grasp and 0 represents an unstable grasp. We propose a neighborhood fidelity objective function, which measures the probability distances of local in-training neighborhoods

N^{t r a i n i n g} (x_{i})

of instance i, between

p_{ϕ}

and

p_{θ}

, which are respectively the outputs of

g_{ϕ}^{g l o b a l}

and

f_{θ}

for all given input training data

X

. The optimization problem is formulated as follows:

min_{f_{θ} \in F} \frac{1}{N} \sum_{i = 1}^{N} D_{l o c a l} (f_{θ} (N^{t r a i n i n g} (x_{i})), g_{ϕ}^{g l o b a l} (N^{t r a i n i n g} (x_{i}))),

(7)

where function

D_{l o c a l}

is a divergence distance measurement, specifically the Jensen–Shannon divergence, to measure the within-neighborhood deviation between the predictive distributions of

f_{θ}

and

g_{ϕ}^{g l o b a l}

.

The Jensen–Shannon divergence between the predictions of the predictor model and the explainer model within a local neighborhood is given by

\begin{matrix} D_{{JS}_{local}} ({\hat{y}}_{ϕ}, {\hat{y}}_{θ}) & = \frac{1}{2} (\sum_{x_{j} \in N_{x_{i}}^{t r a i n i n g}} ln (\frac{{\hat{y_{j}}}_{ϕ}}{{\hat{y_{j}}}_{θ}}) {\hat{y_{j}}}_{ϕ} \\ + \sum_{x_{j} \in N_{x_{i}}^{t r a i n i n g}} ln (\frac{{\hat{y_{j}}}_{θ}}{{\hat{y_{j}}}_{ϕ}}) {\hat{y_{j}}}_{θ}) \end{matrix}

(8)

We thus define our neighborhood fidelity objective function,

L_{J S_{l o c a l}}

, that is calculated using the Jensen–Shannon divergence (JS) as follows:

L_{J S_{l o c a l}} (N_{x_{1 : N}}^{t r a i n i n g}, f_{θ}, g_{ϕ}^{g l o b a l}) : = D_{{JS}_{local}} ({\hat{y}}_{ϕ}, {\hat{y}}_{θ})

(9)

Here,

x_{i}

represents an instance and

N_{x_{i}}^{t r a i n i n g}

its neighbors, f denotes the predictor’s output, and

g_{ϕ}^{g l o b a l}

denotes the global explainer’s prediction. The difference between these two terms measures the variability in the predictions within a local neighborhood.

4. Experiments

4.1. Dataset and Preprocessing

We evaluate the performance of our local pre hoc explainability framework on the grasping dataset obtained from Shadow’s Smart Grasping System [46] simulation with ROS [47] and Gazebo [48] environments using the Smart Grasping Sandbox. The dataset contains data obtained from the three 3-DOF fingers, with information about the joints’ position, velocity, and effort (torque) of each finger, amounting to about 54,000 unique data points and 29 measurements for each experiment.

The classification target is the predicted grasp robustness, which is discretized into a binary label: 1 for a stable grasp and 0 for an unstable grasp. A grasp is considered stable if the robustness value exceeds 100. The dataset was normalized using standard scaling to ensure that each feature contributes equally to the prediction.

4.2. Experimental Protocol

The dataset was randomly divided into training, validation, and test sets with an 80:10:10 ratio. All experiments were repeated five times, and the results were averaged across the five runs and reported along with the standard deviation. All models were trained with

L_{2}

regularization until validation accuracy stabilized for at least ten epochs. For local explainability neighborhood generation, we experimented with different neighborhood sizes. We set the number of neighbors

k = 10

during the training phase (Phase 1: training for fidelity) and explored different values of

k \in {3, 10, 100}

during the testing phase (Phase 2: generating local explanations) to analyze the impact of neighborhood size on explanation quality and computational cost.

The models were implemented using the PyTorch v2.6 framework [49] and executed on an NVIDIA Tesla V100 GPU with 16 GB RAM and an Intel(R) Xeon(R) CPU with 2.20 GHz and 13 GB RAM. We also conducted experiments that compared CPU and GPU performance for both training and explanation generation.

4.3. Models and Baselines

We compared our local pre hoc explainability framework with the following baselines.

Black-Box (BB) Predictor: A non-regularized factorization machine model that serves as the baseline for prediction accuracy.
White-Box (WB) Explainer: A sparse logistic regression model that serves as the explainer.
LIME [21]: A popular post hoc explanation method that generates local explanations by training interpretable models on perturbed samples.

For our pre hoc framework, we experimented with different values of the explainability regularization parameter

λ \in {0.01, 0.05, 0.1, 0.25, 0.5, 1}

to analyze the trade-off between prediction accuracy and explanation fidelity.

4.4. Evaluation Metrics

We evaluate our approach using the following metrics for both prediction accuracy and explanation quality.

4.4.1. Prediction Accuracy

To assess the classification accuracy, we use the Area under the ROC Curve (AUC),

A U C (f_{θ}, \hat{y})

. This metric measures the model’s ability to correctly classify grasp stability across different threshold values.

4.4.2. Global Explanation Fidelity

To evaluate how well the explainer model captures the global behavior of the predictor model, we use the fidelity metric:

A U C (f_{θ}, g_{ϕ}^{g l o b a l})

. This measures the agreement between the predictor model’s predictions and the explainer model’s predictions across the entire test set.

4.4.3. Local Explanation Metrics

To evaluate the quality of local explanations, we use the following key metrics.

Definition 1

(Point Fidelity (PF)). Measures the agreement between the explanations and predictions for an individual instance:

P F = accuracy (f_{θ} (x_{i}), g_{ϕ_{i}}^{l o c a l} (x_{i}))

A higher point fidelity is better.

Definition 2

(Neighborhood Fidelity (NF)). Extends the concept of point fidelity to consider agreement within the local neighborhoods around each instance:

N F = \frac{1}{| N^{t e s t i n g} (x_{i}) |} \sum_{x_{j} \in N^{t e s t i n g} (x_{i})} accuracy (f_{θ} (x_{j}), g_{ϕ_{i}}^{l o c a l} (x_{j}))

A higher neighborhood fidelity is better.

Definition 3

(Stability). Measures the consistency of explanations across different runs or small perturbations of the input:

S t a b i l i t y = \frac{1}{m (m - 1)} \sum_{i = 1}^{m} \sum_{j = i + 1}^{m} d (e_{i}, e_{j})

where d is a distance function (Euclidean) between the explanations and

e_{i}, e_{j}

are explanations generated from slightly perturbed versions of the same instance. A lower stability is better.

5. Results and Discussion

In this section, we present and analyze the results of our experiments on the grasping dataset to evaluate the effectiveness of our local pre hoc explainability framework. We organize our findings according to the research questions outlined in Section Research Questions and Contributions.

5.1. RQ 1—Explanatory Power: How Well Does the Explainer Model Mimic the Predictor Model in Phase 1?

To assess the global explanatory power of the pre hoc framework, we examined how well the explanations generated by the framework capture the nuances of the black-box model’s decision-making process by analyzing the fidelity AUC scores for different values of the explainability regularization parameter

λ

.

Figure 4b presents the fidelity AUC scores for the pre hoc framework on the grasping dataset for different values of

λ

. As

λ

increases, the fidelity AUC shows a notable improvement, increasing from around 0.80 at

λ = 0.01

to approximately 0.95 at

λ = 1

. This indicates that higher values of

λ

strengthen the regularization effect, encouraging the Pre Hoc Predictor to align more closely with the WB Explainer. Since the WB Explainer is used to guide the learning of the Pre Hoc Predictor, the high fidelity scores suggest that the explanations generated by the pre hoc framework effectively capture the nuances of the black-box model’s decision-making process.

The high fidelity scores achieved by the proposed framework demonstrate the effectiveness of using the WB Explainer to guide the learning of the black-box predictor and generate explanations that align closely with the model’s decisions. The results indicate that our pre hoc framework can produce explanations that faithfully represent the model’s decision-making process, which is essential for building trust in the predictions of grasp failure in robotic systems.

5.2. RQ 2—Trade-Off Between Accuracy and Explanation Fidelity: How Does Explainability Regularization $λ$ Affect the Accuracy and Fidelity Score in Phase 1?

To analyze the effect of the explainability regularization parameter

λ

on the accuracy and fidelity scores, we examined how varying

λ

affects both prediction performance and explanation quality.

Figure 4a displays the accuracy based on AUC scores for the Pre Hoc Predictor, BB Predictor, and WB Explainer models, evaluated on the grasping test set for different values of

λ

. As

λ

increases from 0.01 to 1, the AUC of the Pre Hoc Predictor remains relatively stable, with a slight decrease from around 0.83 to 0.81. The AUC of the Pre Hoc Predictor is consistently higher than that of the WB Explainer, demonstrating that the pre hoc framework maintains a good balance between accuracy and explainability. The BB Predictor, which is the unregularized black-box model, has a slightly higher accuracy than the Pre Hoc Predictor, but the difference is minimal.

These results demonstrate the effectiveness of the explainability regularization parameter

λ

in controlling the trade-off between accuracy and explainability in the pre hoc framework. Lower values of

λ

prioritize prediction accuracy, whereas higher values emphasize explainability. The choice of

λ

depends on the specific requirements of the application and the desired balance between accuracy and interpretability. With high values of

λ

, the accuracy of the Pre Hoc Predictor remains competitive with that of the BB Predictor on the grasping dataset, indicating that the pre hoc framework effectively incorporates explainability without significantly compromising predictive performance.

Table 2 provides a comparison of the different models in terms of prediction accuracy (AUC) and explanation fidelity. The results show that as

λ

increases from 0.01 to 0.1, the fidelity improves significantly from 0.786 to 0.956, at the cost of a slight decrease in accuracy from 0.833 to 0.812. This represents a trade-off between prediction performance and explanation quality, with a modest 2.5% decrease in accuracy leading to a substantial 21.6% increase in fidelity. The pre hoc framework with

λ = 0.05

provides a good balance, with an accuracy of 0.824 and a fidelity of 0.872.

5.3. RQ 3—Locality: How Well Do the Explanations Capture the Local Behavior of the Model Compared to LIME in Phase 2?

To assess the local explainability of our proposed framework, we compared its performance with the LIME post hoc explainability method in terms of point fidelity, neighborhood fidelity, and stability.

Table 3 shows the comparison results between our pre hoc framework and LIME. Our pre hoc framework significantly outperforms LIME in terms of point fidelity (0.917 vs. 0.700), neighborhood fidelity (0.960 vs. 0.741), and stability (0.022 vs. 0.215). These results indicate that our pre hoc framework generates explanations that are more faithful to the model’s predictions and more consistent across different runs compared to LIME.

The high point fidelity and neighborhood fidelity scores of our pre hoc framework indicate that the generated explanations are consistent with the model’s predictions for individual instances and within local neighborhoods. The low stability score suggests that our framework produces more consistent explanations across different runs compared to LIME. This is particularly important in robotics applications, where the reliability and consistency of explanations are crucial for building trust in the system.

The superior performance of our pre hoc framework can be attributed to the fact that it incorporates explainability during the training phase, ensuring that the model’s predictions are inherently aligned with its explanations. In contrast, LIME, being a post hoc method, attempts to explain the model’s decisions after the fact, which can lead to less faithful and less consistent explanations.

5.4. RQ 4—Neighborhood Size: How Does Neighborhood Size Affect Neighborhood Fidelity, Stability, and Computational Cost in Phase 2?

To understand the impact of neighborhood size on explanation quality and computational cost, we conducted experiments with different values of k in Testing Phase 2.

Table 4 presents the results of our experiments with different neighborhood sizes during Testing Phase 2. As the neighborhood size increases from 3 to 100, we observe the following trends.

Neighborhood fidelity increases from 0.883 for $k = 3$ to 0.967 for $k = 100$ . This indicates that larger neighborhoods better capture the local patterns and provide more accurate explanations.
Stability decreases significantly from 0.215 for $k = 3$ to 0.002 for $k = 100$ . A lower stability value indicates more consistent explanations across different instances within the neighborhood. This suggests that larger neighborhoods provide more stable explanations.
Computation time increases slightly from 0.012 s for $k = 3$ to 0.014 s for $k = 100$ . The difference is minimal, indicating that the computational cost is not significantly impacted by the choice of neighborhood size, at least within the range of values considered.

These results provide valuable insights for selecting the appropriate neighborhood size for generating local explanations in robotic grasp failure prediction. If the primary concern is explanation quality and stability, larger neighborhood sizes (e.g.,

k = 100

) are preferable. If computational efficiency is a priority, smaller neighborhood sizes (e.g.,

k = 3

or

k = 10

) can be used without significantly compromising explanation quality. In our experiments,

k = 10

provides a good balance between explanation quality and computational cost.

5.5. Local Explanation Example for Grasping Dataset

To provide a concrete example of the local explanations generated by our pre hoc framework, we present a case study of an individual test instance from the grasping dataset. Figure 5 illustrates the local feature importance scores for a specific test instance from the grasping dataset. The explanation highlights the most influential features for this particular prediction, providing insights into the factors contributing to the model’s decision.

For this specific instance, the explanation reveals that both effort and velocity features have significant impacts on the prediction, with varying directions of influence. This level of granularity in the explanation provides valuable insights for robotics engineers to understand the specific factors affecting grasp stability for individual scenarios, which can guide improvements in grasping strategies.

5.6. Global Explainability Insights

While our focus in this paper is on local explainability, it is worth noting the global patterns identified by our pre hoc framework in the grasping dataset. Figure 6 shows the top 10 global feature importance scores for the grasping dataset. The most influential feature is H1F3J2eff (effort exerted in Joint 2 of Finger 3), which increases the likelihood of grasp failure. In contrast, the efforts exerted on Joint 2 of Fingers 1 and 2 and Joint 1 of Finger 3 reduce the probability of grasp failure. These global patterns provide general information on the factors that affect grasp stability in all instances.

Comparing the local explanation (Figure 5) with the global explanation (Figure 6), we observe similarities and differences. Although the global explanation emphasizes the importance of effort features over velocity features, the local explanation shows a more mixed impact of both types of feature. This confirms the value of local explanations in capturing instance-specific patterns that can deviate from global trends.

Our pre hoc framework provides both global and local explainability, offering a comprehensive understanding of the model’s behavior at different levels of granularity. This multilevel explainability is particularly valuable in robotics applications, where both general patterns and instance-specific insights are essential for designing robust and reliable systems.

5.7. Limitations and Future Directions

While our experimental evaluation demonstrates the effectiveness of the proposed local pre hoc explainability framework, it is important to acknowledge the limitations of the dataset and discuss the generalization potential of our results to broader robotic grasping scenarios. The dataset is limited to grasping tasks performed in a static Gazebo simulation environment with consistent lighting and no external disturbances. Real-world grasping scenarios present additional challenges, including sensor noise, varying lighting conditions, occlusions, and dynamic environments. Although simulation provides safe, low-cost, and controlled conditions for systematic evaluation, there is an inherent gap between simulated and real-world robotic systems. Factors such as mechanical compliance, sensor accuracy, actuator dynamics, and physical wear are simplified or idealized in simulation. The transition from simulation to real robotic platforms may introduce additional failure modes and sources of uncertainty that are not captured in the current dataset. Sparse neighborhoods could cause failures. When local neighborhoods contain fewer than 3–5 training instances, the quality of the explanation may be degraded, where unusual sensor combinations (e.g., extremely high joint efforts combined with very low velocities) may result in sparse local neighborhoods. Our method assumes that local neighborhoods contain instances with similar underlying causal relationships. This assumption may be violated in complex robotic environments where subtle changes in object properties or environmental conditions create non-obvious feature interactions.

Our implementation uses the scikit-learn k-nearest neighbors model with Euclidean distance as the default metric for neighborhood construction. Although the Euclidean distance is computationally efficient and appropriate for continuous sensor data, we acknowledge that systematic evaluation of alternative distance metrics (such as the Mahalanobis, Manhattan or Cosine distance) and their impact on neighborhood quality and explanation fidelity would provide valuable insights into optimal distance metric selection for different types of robotic sensor data and represent an important direction for future research.

Furthermore, our current analysis focuses primarily on an ablation study and a comparison with LIME as the main baseline for local explainability. A more comprehensive evaluation comparing computational performance with additional explainability methods such as SHAP, Integrated Gradients, and other local explanation techniques would provide a broader perspective on our framework’s efficiency advantages and represent an important direction for future work.

Although we acknowledge these limitations, the controlled evaluation demonstrates the fundamental viability of our local explainability approach for robotic grasping applications. Significant improvements in explanation fidelity, stability, and computational efficiency suggest that the framework provides a solid foundation for deployment in practical scenarios, with appropriate domain-specific adaptations and additional validation as systems transition from controlled to operational environments.

6. Conclusions

In this paper, we extended our recently proposed pre hoc explainability framework [23] to incorporate local explainability for robot grasp failure prediction. By leveraging neighborhood information and the Jensen–Shannon divergence, our approach generates instance-specific explanations that capture the model’s decision-making process for individual grasp scenarios. The two-phase methodology first trains the predictor model with local explainability constraints, then fine-tunes the pre-trained explainer for each test instance within its local neighborhood, resulting in explanations with superior point fidelity, neighborhood fidelity, and stability compared to traditional post hoc methods like LIME.

Future work could extend this framework to multiclass classification or regression problems for predicting continuous measures of grasp quality, explore alternative explanation formats beyond feature importance scores, and incorporate temporal data to explain dynamic grasping processes. Additional validation on different robotic platforms and integration of domain knowledge from robotics experts would further enhance the framework’s applicability. Despite these opportunities for improvement, our local pre hoc explainability framework represents a significant advancement in transparent and interpretable robot grasp failure prediction, enhancing trust in autonomous robotic systems while providing valuable insights for improving grasping strategies.

Author Contributions

Conceptualization, C.A., A.A., D.O.P. and O.N.; methodology, C.A., A.A., O.N. and D.O.P.; software, C.A.; validation, C.A. and O.N.; formal analysis, C.A. and O.N.; investigation, A.A., C.A. and O.N.; resources, O.N. and D.O.P.; data curation, C.A.; writing—original draft preparation, C.A. and O.N.; writing—review and editing, C.A., O.N. and D.O.P.; visualization, C.A.; supervision, D.O.P. and O.N.; funding acquisition, D.O.P. and O.N. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by NSF-EPSCoR–RII Track-1: Kentucky Advanced Manufacturing Partnership for Enhanced Robotics and Structures (Award IIP#1849213) and by NSF DRL-2026584.

Data Availability Statement

The simulated robot grasp dataset used in this study is publicly available and was originally provided by Cupcic et al. [50] at https://www.kaggle.com/ugocupcic/grasping-dataset (accessed on 15 December 2023).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AI	Artificial Intelligence
AUC	Area Under the Curve
BB	Black-Box
BCE	Binary Cross-Entropy
DOF	Degrees of Freedom
FM	Factorization Machine
JS	Jensen–Shannon
k-NN	k-Nearest Neighbors
LIME	Local Interpretable Model-agnostic Explanations
ML	Machine Learning
NF	Neighborhood Fidelity
PF	Point Fidelity
ROS	Robot Operating System
SHAP	SHapley Additive exPlanations
WB	White-Box

References

Nelson, J.P.; Biddle, J.B.; Shapira, P. Applications and Societal Implications of Artificial Intelligence in Manufacturing: A Systematic Review. arXiv 2023, arXiv:2308.02025. [Google Scholar] [CrossRef]
Ornes, S. Peering inside the black box of AI. Proc. Natl. Acad. Sci. USA 2023, 120, e2307432120. [Google Scholar] [CrossRef]
Samek, W.; Montavon, G.; Lapuschkin, S.; Anders, C.J.; Müller, K. Explaining Deep Neural Networks and Beyond: A Review of Methods and Applications. Proc. IEEE 2021, 109, 247–278. [Google Scholar] [CrossRef]
Samek, W.; Wiegand, T.; Müller, K.R. Explainable Artificial Intelligence: Understanding, Visualizing and Interpreting Deep Learning Models. arXiv 2017, arXiv:1708.08296. [Google Scholar]
Sengupta, S.; Basak, S.; Saikia, P.; Paul, S.; Tsalavoutis, V.; Atiah, F.D.; Ravi, V.; Peters, A. A review of deep learning with special emphasis on architectures, applications and recent trends. Knowl. Based Syst. 2020, 194, 105596. [Google Scholar] [CrossRef]
Li, X.; Xiong, H.; Li, X.; Wu, X.; Zhang, X.; Liu, J.; Bian, J.; Dou, D. Interpretable deep learning: Interpretation, interpretability, trustworthiness, and beyond. Knowl. Inf. Syst. 2022, 64, 3197–3234. [Google Scholar] [CrossRef]
Bennetot, A.; Donadello, I.; Qadi, A.E.; Dragoni, M.; Frossard, T.; Wagner, B.; Saranti, A.; Tulli, S.; Trocan, M.; Chatila, R.; et al. A Practical Guide on Explainable Ai Techniques Applied on Biomedical Use Case Applications. arXiv 2022, arXiv:2111.14260. [Google Scholar] [CrossRef]
Zhang, J.; Gao, R.X. Deep Learning-Driven Data Curation and Model Interpretation for Smart Manufacturing. Chin. J. Mech. Eng. 2021, 34, 71. [Google Scholar] [CrossRef]
Alvanpour, A.; Das, S.K.; Robinson, C.K.; Nasraoui, O.; Popa, D. Robot Failure Mode Prediction with Explainable Machine Learning. In Proceedings of the 2020 IEEE 16th International Conference on Automation Science and Engineering (CASE), Hong Kong, China, 20–21 August 2020; pp. 61–66. [Google Scholar] [CrossRef]
Chakraborty, S.; Tomsett, R.; Raghavendra, R.; Harborne, D.; Alzantot, M.; Cerutti, F.; Srivastava, M.; Preece, A.; Julier, S.; Rao, R.M.; et al. Interpretability of deep learning models: A survey of results. In Proceedings of the 2017 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computed, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI), San Francisco, CA, USA, 4–8 August 2017; pp. 1–6. [Google Scholar] [CrossRef]
Gilpin, L.H.; Bau, D.; Yuan, B.Z.; Bajwa, A.; Specter, M.; Kagal, L. Explaining Explanations: An Overview of Interpretability of Machine Learning. arXiv 2018, arXiv:1806.00069. [Google Scholar] [CrossRef]
Samek, W. Explainable Artificial Intelligence. In ITU Journal: ICT Discoveries; 2017; Special Issue No. 1. Available online: https://www.itu.int/dms_pub/itu-s/opb/journal/S-JOURNAL-ICTS.V1I1-2017-5-PDF-E.pdf (accessed on 7 May 2025).
Slack, D.; Hilgard, S.; Jia, E.; Singh, S.; Lakkaraju, H. Fooling LIME and SHAP: Adversarial Attacks on Post Hoc Explanation Methods. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, New York, NY, USA, 7–9 February 2020; pp. 180–186. [Google Scholar] [CrossRef]
Laugel, T.; Lesot, M.J.; Marsala, C.; Renard, X.; Detyniecki, M. The Dangers of Post-hoc Interpretability: Unjustified Counterfactual Explanations. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19, Macao, China, 10–16 August 2019; pp. 2801–2807. [Google Scholar]
Wang, C.; Zhang, X.; Zang, X.; Liu, Y.; Ding, G.; Yin, W.; Zhao, J. Feature Sensing and Robotic Grasping of Objects with Uncertain Information: A Review. Sensors 2020, 20, 3707. [Google Scholar] [CrossRef]
Zhang, S.; Zhang, S.; Wang, B.; Habetler, T.G. Deep Learning Algorithms for Bearing Fault Diagnostics—A Comprehensive Review. IEEE Access 2020, 8, 29857–29881. [Google Scholar] [CrossRef]
Mahler, J.; Matl, M.; Satish, V.; Danielczuk, M.; DeRose, B.; McKinley, S.; Goldberg, K. Learning ambidextrous robot grasping policies. Sci. Robot. 2019, 4, eaau4984. [Google Scholar] [CrossRef]
Joshi, S.; Kumra, S.; Sahin, F. Robotic Grasping using Deep Reinforcement Learning. In Proceedings of the 2020 IEEE 16th International Conference on Automation Science and Engineering (CASE), Hong Kong, China, 20–21 August 2020; pp. 1461–1466. [Google Scholar] [CrossRef]
Alvanpour, A.; Acun, C.; Spurlock, K.; Robinson, C.K.; Das, S.K.; Popa, D.O.; Nasraoui, O. Comparative Analysis of Post Hoc Explainable Methods for Robotic Grasp Failure Prediction. Electronics 2025, 14, 1868. [Google Scholar] [CrossRef]
Lundberg, S.M.; Lee, S.I. A Unified Approach to Interpreting Model Predictions. In Advances in Neural Information Processing Systems 30; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: Nice, France, 2017; pp. 4765–4774. [Google Scholar]
Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 1135–1144. [Google Scholar]
Acun, C.; Nasraoui, O. In-Training Explainability Frameworks: A Method to Make Black-Box Machine Learning Models More Explainable. In Proceedings of the 2023 IEEE International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT), Venice, Italy, 26–29 October 2023; pp. 230–237. [Google Scholar]
Acun, C.; Ashary, A.; Popa, D.O.; Nasraoui, O. Enhancing Robotic Grasp Failure Prediction Using A Pre-hoc Explainability Framework*. In Proceedings of the 2024 IEEE 20th International Conference on Automation Science and Engineering (CASE), Bari, Italy, 28 August–1 September 2024; pp. 1993–1998. [Google Scholar] [CrossRef]
Acun, C.; Nasraoui, O. Pre-hoc and Co-hoc Explainability: Novel Frameworks for Integrating Interpretability into Machine Learning Training for Enhanced Transparency and Performance. Preprints 2025. [Google Scholar] [CrossRef]
Haidegger, T.; Speidel, S.; Stoyanov, D.; Satava, R.M. Robot-Assisted Minimally Invasive Surgery—Surgical Robotics in the Data Age. Proc. IEEE 2022, 110, 835–846. [Google Scholar] [CrossRef]
Li, R.; Qiao, H. A Survey of Methods and Strategies for High-Precision Robotic Grasping and Assembly Tasks—Some New Trends. IEEE/ASME Trans. Mechatronics 2019, 24, 2718–2732. [Google Scholar] [CrossRef]
Jahanshahi, H.; Zhu, G. Review of Machine Learning in Robotic Grasping Control in Space Application. Acta Astronaut. 2024, 220, 37–61. [Google Scholar] [CrossRef]
Yang, G.Z.; Cambias, J.; Cleary, K.; Daimler, E.; Drake, J.; Dupont, P.E.; Hata, N.; Kazanzides, P.; Martel, S.; Taylor, R.H.; et al. Medical robotics—regulatory, ethical, and legal considerations for increasing levels of autonomy. Sci. Robot. 2017, 2, eaam8638. [Google Scholar] [CrossRef]
Team, O.C.L. Robotics at a global regulatory crossroads: Compliance challenges for autonomous systems. Lexology 2024. Available online: https://www.lexology.com/library/detail.aspx?g=b7228083-df73-4e1e-866f-c40e7cb26a2a (accessed on 15 February 2024).
Alemzadeh, H.; Raman, J.; Leveson, N.; Kalbarczyk, Z.; Iyer, R.K. Adverse events in robotic surgery: A retrospective study of 14 years of FDA data. PLoS ONE 2016, 11, e0151470. [Google Scholar] [CrossRef]
Bessler, J.; Prange-Lasonder, G.B.; Schaake, L.; Saenz, J.F.; Bidard, C.; Fassi, I.; Valori, M.; Lassen, A.B.; Buurke, J.H. Safety assessment of rehabilitation robots: A review identifying safety skills and current knowledge gaps. Front. Robot. AI 2021, 8, 602878. [Google Scholar] [CrossRef] [PubMed]
Murdoch, W.J.; Singh, C.; Kumbier, K.; Abbasi-Asl, R.; Yu, B. Definitions, methods, and applications in interpretable machine learning. Proc. Natl. Acad. Sci. USA 2019, 116, 22071–22080. [Google Scholar] [CrossRef] [PubMed]
Sundararajan, M.; Taly, A.; Yan, Q. Axiomatic attribution for deep networks. In Proceedings of the International Conference on Machine Learning, Sydney, NSW, Australia, 6–11 August 2017; pp. 3319–3328. [Google Scholar]
Ribeiro, M.T.; Singh, S.; Guestrin, C. Anchors: High-precision model-agnostic explanations. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; AAAI Press: Washington, DC, USA, 2018. [Google Scholar]
Shrikumar, A.; Greenside, P.; Kundaje, A. Learning important features through propagating activation differences. arXiv 2017, arXiv:1704.02685. [Google Scholar]
Bach, S.; Binder, A.; Montavon, G.; Klauschen, F.; Müller, K.R.; Samek, W. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLoS ONE 2015, 10, e0130140. [Google Scholar] [CrossRef] [PubMed]
Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 2019, 1, 206–215. [Google Scholar] [CrossRef]
Chen, C.; Li, O.; Tao, D.; Barnett, A.; Rudin, C.; Su, J.K. This looks like that: Deep learning for interpretable image recognition. Adv. Neural Inf. Process. Syst. 2019, 32, 8930–8941. [Google Scholar]
Alvarez-Melis, D.; Jaakkola, T.S. Towards robust interpretability with self-explaining neural networks. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, Montréal, Canada, 3–8 December 2018; pp. 7786–7795. [Google Scholar]
Damak, K.; Boujelbene, M.; Acun, C.; Alvanpour, A.; Das, S.K.; Popa, D.O.; Nasraoui, O. Robot failure mode prediction with deep learning sequence models. Neural Comput. Appl. 2025, 37, 4291–4302. [Google Scholar] [CrossRef]
Zhang, X.; Wang, N.; Shen, H.; Ji, S.; Luo, X.; Wang, T. Interpretable deep learning under fire. In Proceedings of the 30th {USENIX} Security Symposium ({USENIX} Security 21), Vancouver, BC, Canada, 11–13 August 2018. [Google Scholar]
Plumb, G.; Al-Shedivat, M.; Xing, E.; Talwalkar, A. Regularizing black-box models for improved interpretability. Adv. Neural Inf. Process. Syst. 2020, 33, 10526–10536. [Google Scholar]
Wu, M.; Hughes, M.C.; Parbhoo, S.; Zazzi, M.; Roth, V.; Doshi-Velez, F. Beyond sparsity: Tree regularization of deep models for interpretability. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; Volume 32. [Google Scholar]
Ross, A.S.; Hughes, M.C.; Doshi-Velez, F. Right for the right reasons: Training differentiable models by constraining their explanations. In Proceedings of the 26th International Joint Conference on Artificial Intelligence, Melbourne, Australia, 19–25 August 2017; pp. 2662–2670. [Google Scholar]
Rendle, S. Factorization Machines. In Proceedings of the 2010 IEEE International Conference on Data Mining, Sydney, NSW, Australia, 13–17 December 2010; pp. 995–1000. [Google Scholar] [CrossRef]
Shadow Robot Company. Smart Grasping Sandbox. GitHub Repository. 2017. Available online: https://github.com/shadow-robot/smart_grasping_sandbox (accessed on 15 December 2023).
Stanford Artificial Intelligence Laboratory. Robotic Operating System. Version ROS Melodic Morenia. 2018. Available online: https://www.ros.org (accessed on 6 May 2025).
Koenig, N.; Howard, A. Design and use paradigms for Gazebo, an open-source multi-robot simulator. In Proceedings of the 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566), Sendai, Japan, 28 September–2 October 2004; Volume 3, pp. 2149–2154. [Google Scholar] [CrossRef]
Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. arXiv 2019, arXiv:1912.01703. [Google Scholar]
Cupcic, U. A Grasping Dataset from Simulation Using Shadow Robot’s Smart Grasping Sandbox, Shadow Robot Company. 2019. Available online: https://www.kaggle.com/ugocupcic/grasping-dataset (accessed on 15 December 2023).

Figure 1. Training phase of pre hoc explainability framework.

Figure 2. Shadow robot in the smart grasping simulation.

Figure 3. The pre hoc local explainable machine learning framework consists of two phases [24]. Phase 1: Training for fidelity is the training phase that optimizes the agreement between the white-box explainer and black-box predictor models, quantified by fidelity. Phase 2: Generating a local explanation for a new test instance by fine-tuning the white-box explainer model within its neighboring training instances.

Figure 4. Grasping dataset. Pre hoc local explainability framework: (a) accuracy fidelity; (b) fidelity AUC for different

λ_{1}

= 0.01, 0.05, 0.25, 0.5, 1. Pre hoc local predictor is our proposed model, BB is the original black-box predictor model, and WB is the explainer model.

Figure 4. Grasping dataset. Pre hoc local explainability framework: (a) accuracy fidelity; (b) fidelity AUC for different

λ_{1}

= 0.01, 0.05, 0.25, 0.5, 1. Pre hoc local predictor is our proposed model, BB is the original black-box predictor model, and WB is the explainer model.

Figure 5. Grasping dataset: Local explanation example for test instance 143. The true label is predicted correctly by the predictor model in the pre hoc framework, and the plotted top 10 feature importance scores are generated by the explainer model in the pre hoc framework.

Figure 6. Grasping dataset: Top 10 feature importance scores from the global explanation of the pre hoc framework, showing that increased effort exerted in Joint 2 of Finger 3 contributes tremendously to producing grasp failure, while increased efforts exerted at Joint 2 of Fingers 1 and 2 and Joint 1 of Finger 3 lead to reducing grasp failure.

Table 1. Mathematical notation.

Symbol	Definition
$S$	Training dataset with input–output pairs
$X$	Input feature space (sensor measurements)
$Y$	Output label space ${0, 1}$ (grasp success/failure)
$f_{θ}$	Black-box predictor model with parameters $θ$
$g_{ϕ}^{g l o b a l}$	Global white-box explainer model with parameters $ϕ$
$g_{ϕ_{i}}^{l o c a l}$	Local explainer model fine-tuned for instance i
$N^{t r a i n i n g} (x_{i})$	k-nearest neighbors of $x_{i}$ in the training set
$N^{t e s t i n g} (x_{i})$	k-nearest neighbors of $x_{i}$ in the test context
$D_{J S}$	Jensen–Shannon divergence between probability distributions
$λ$	Explainability regularization coefficient
k	Number of neighbors in a local neighborhood
$L_{B C E}$	Binary cross-entropy loss function
$L_{J S_{l o c a l}}$	Local Jensen–Shannon divergence loss

Table 2. Grasping dataset: Model comparison in terms of prediction accuracy and fidelity of explainability. The explainer is the white-box model, and BB is the predictor black-box model. Higher AUC and fidelity are better.

Model	AUC ↑	Fidelity ↑
Explainer WB	0.8080	-
Predictor BB	0.8340	0.7532
Pre hoc BB $λ$ = 0.01	0.8327	0.7860
Pre hoc BB $λ$ = 0.05	0.8242	0.8718
Pre hoc BB $λ$ = 0.1	0.8120	0.9560

Table 3. Grasping dataset: Comparison with LIME based on point fidelity, neighborhood fidelity, and stability results.

λ = 0.25

,

k = 10

during Training Phase 1,

k = 100

during Testing Phase 2. Results are averaged over 5 runs. A lower stability is better.

Table 3. Grasping dataset: Comparison with LIME based on point fidelity, neighborhood fidelity, and stability results.

λ = 0.25

,

k = 10

during Training Phase 1,

k = 100

during Testing Phase 2. Results are averaged over 5 runs. A lower stability is better.

Explanation Method	Point Fidelity ↑	Neighborhood Fidelity ↑	Stability ↓
LIME	0.7000 ± 0.0150	0.7410 ± 0.1345	0.2152 ± 0.1667
Pre hoc Framework	0.9170 ± 0.0454	0.9597 ± 0.0493	0.0219 ± 0.0110

Table 4. Grasping dataset: Effect of neighborhood size k during Testing Phase 2 on neighborhood fidelity, stability, and computation cost.

λ = 0.25

,

k = 10

during Training Phase 1. A lower stability is better.

Table 4. Grasping dataset: Effect of neighborhood size k during Testing Phase 2 on neighborhood fidelity, stability, and computation cost.

λ = 0.25

,

k = 10

during Training Phase 1. A lower stability is better.

Neighborhood Size	Neighborhood Fidelity ↑	Stability ↓	Computation Time (s)
$k = 3$	0.8833 ± 0.1939	0.2152 ± 0.0175	0.0121 ± 0.0014
$k = 10$	0.9350 ± 0.0381	0.0505 ± 0.0098	0.0127 ± 0.0009
$k = 100$	0.9670 ± 0.0013	0.0015 ± 0.00006	0.0144 ± 0.0061

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Acun, C.; Ashary, A.; Popa, D.O.; Nasraoui, O. Optimizing Local Explainability in Robotic Grasp Failure Prediction. Electronics 2025, 14, 2363. https://doi.org/10.3390/electronics14122363

AMA Style

Acun C, Ashary A, Popa DO, Nasraoui O. Optimizing Local Explainability in Robotic Grasp Failure Prediction. Electronics. 2025; 14(12):2363. https://doi.org/10.3390/electronics14122363

Chicago/Turabian Style

Acun, Cagla, Ali Ashary, Dan O. Popa, and Olfa Nasraoui. 2025. "Optimizing Local Explainability in Robotic Grasp Failure Prediction" Electronics 14, no. 12: 2363. https://doi.org/10.3390/electronics14122363

APA Style

Acun, C., Ashary, A., Popa, D. O., & Nasraoui, O. (2025). Optimizing Local Explainability in Robotic Grasp Failure Prediction. Electronics, 14(12), 2363. https://doi.org/10.3390/electronics14122363

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Optimizing Local Explainability in Robotic Grasp Failure Prediction

Abstract

1. Introduction

Research Questions and Contributions

2. Background

2.1. Application Scenarios for Robotic Grasp Failure Prediction

2.2. Explainability in Machine Learning

2.2.1. Global vs. Local Explainability

2.2.2. Post Hoc Explainability

2.3. Notation and Definitions

2.4. In-Training Explainability

2.5. Factorization Machines

2.6. Pre Hoc Explainability Framework

2.7. Jensen–Shannon Divergence

3. Problem Formulation

3.1. Local Explainability Using Neighborhoods

3.2. Neighborhood Fidelity Objective Function

4. Experiments

4.1. Dataset and Preprocessing

4.2. Experimental Protocol

4.3. Models and Baselines

4.4. Evaluation Metrics

4.4.1. Prediction Accuracy

4.4.2. Global Explanation Fidelity

4.4.3. Local Explanation Metrics

5. Results and Discussion

5.1. RQ 1—Explanatory Power: How Well Does the Explainer Model Mimic the Predictor Model in Phase 1?

5.2. RQ 2—Trade-Off Between Accuracy and Explanation Fidelity: How Does Explainability Regularization λ Affect the Accuracy and Fidelity Score in Phase 1?

5.3. RQ 3—Locality: How Well Do the Explanations Capture the Local Behavior of the Model Compared to LIME in Phase 2?

5.4. RQ 4—Neighborhood Size: How Does Neighborhood Size Affect Neighborhood Fidelity, Stability, and Computational Cost in Phase 2?

5.5. Local Explanation Example for Grasping Dataset

5.6. Global Explainability Insights

5.7. Limitations and Future Directions

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

5.2. RQ 2—Trade-Off Between Accuracy and Explanation Fidelity: How Does Explainability Regularization $λ$ Affect the Accuracy and Fidelity Score in Phase 1?