1. Introduction
When using machine learning (ML) algorithms for diagnostic or classification tasks, it is essential to understand how the model reaches its conclusions [
1,
2]. Thus, explainable artificial intelligence (XAI) plays a crucial role in this domain [
1,
3]. The necessity of XAI depends strongly on the application [
1,
3,
4]. In fields such as medicine and biomechanics, model transparency is imperative [
2,
3,
5]. Furthermore, AI systems must translate their outputs into domain-specific language that is comprehensible to experts while taking into account their varying levels of expertise [
2,
3,
5].
One widely used method for interpreting model decisions is Shapley Additive Explanations (SHAP) [
6,
7,
8]. SHAP attributes feature-wise contributions to individual predictions, enabling a detailed understanding of model behavior. Its model-agnostic nature allows SHAP to be applied across a broad spectrum of ML models and datasets [
3,
6,
7,
8]. In addition to numerical attributions, SHAP also provides graphical visualizations such as decision plots, which illustrate how features accumulate toward a final prediction [
6].
However, interpreting outliers—instances that deviate substantially from the expected feature attribution pattern—remains a challenge [
1,
2]. Outliers are often dismissed because they are difficult to interpret or may stem from measurement errors. Yet, in the absence of evidence for systematic errors, atypical instances may contain meaningful and novel insights [
9,
10]. In medicine, such outliers may reveal rare or borderline manifestations of diseases [
9,
10], and the same holds true for movement disorders such as cerebral palsy or Parkinson’s disease [
11,
12].
Existing outlier detection approaches typically rely on statistical distances in the feature space, such as z-score filtering, Mahalanobis distance, or density-based methods like Local Outlier Factor (LOF) [
13,
14,
15]. However, these approaches generally ignore the underlying model behavior and do not incorporate localized explanations from XAI tools [
8].
Recent work has shown that local feature attributions can vary across instances and modeling conditions, motivating the analysis of explanation patterns beyond aggregate summaries [
16,
17]. In parallel, explanations have increasingly been treated as objects of analysis in their own right, for example, through systematic evaluation and comparison of explanation outputs across models and datasets [
18]. However, existing approaches typically do not focus on systematically characterizing atypical explanation patterns for correctly classified instances or providing a quantitative score to assess such deviations.
To address this gap, we introduce the Classification Outlier Variability Score (COVAS), a method that leverages SHAP values to identify instances whose decision paths deviate substantially from the class-specific mean. Since COVAS is based on SHAP values, it too is model- and data-agnostic and can, in principle, be extended to XAI methods with characteristics similar to SHAP. This study demonstrates the applicability of COVAS in two domains: medical diagnosis, using the Breast Cancer Wisconsin Diagnostic dataset [
19], and sports science, illustrated with the FIFA 2018 Man of the Match dataset [
20], where the availability of large-scale biomechanical datasets remains limited [
21]. Since SHAP explanations are available for multiclass settings, COVAS is not inherently limited to binary classification [
6]. Yet in this study, we focus on binary case studies.
Unlike classical outlier detection methods, COVAS is not designed to identify mislabeled samples, noise, or anomalies in the input space. Instead, its purpose is to highlight atypical but correctly classified instances whose SHAP-based decision paths deviate from typical class behavior. Such cases may reveal non-obvious patterns that can support further analysis or motivate hypothesis generation beyond simplified assumptions about feature relevance. COVAS, therefore, serves as an explanation-driven knowledge-discovery tool rather than a performance-oriented anomaly detection algorithm.
This work, therefore, focuses on explanation-level variability, not on anomaly detection performance, and evaluates COVAS through qualitative, domain-oriented case analyses.
2. Materials and Methods
All experiments in this work focus on binary classification tasks and serve as illustrative case studies for the proposed framework.
2.1. Hardware and Software
All experiments were conducted on an Apple M3 Max system with 36 GB of unified memory using Python 3.9.15. Key libraries included
numpy,
pandas,
tensorflow-keras,
scikit-learn,
shap, and
matplotlib. A fixed random seed of 100 was used throughout all experiments to ensure reproducibility. A detailed list of software versions and hardware specifications is provided in
Appendix A.
2.2. Datasets
Both datasets were processed using the same machine learning (ML) model and similar preprocessing steps. For both datasets, feature and target names were extracted. Since neither dataset included unique instance identifiers, these were created manually to allow detected outliers to be traced back to the original data.
In both cases, a train–test split was performed using the train_test_split function from scikit-learn with a random state of 100 and a test set size of 30% of the full dataset. All features were standardized using the StandardScaler from scikit-learn to improve network performance.
2.2.1. Breast Cancer (BC) Dataset
The breast cancer dataset from Wolberg et al. [
19] comprises a total of 569 observations, each represented by 30 quantitative features extracted from digitized images of fine needle aspiration (FNA) biopsies of breast tissue. These features quantify the morphological characteristics of cell nuclei, including parameters such as radius, texture, perimeter, area, and smoothness, calculated using statistical metrics such as the mean, standard error, and extreme values. The dataset supports binary classification, with each sample classified as either malignant (212 cases) or benign (357 cases), and it is widely used for developing and evaluating ML models for breast cancer diagnosis. In this study, it was used to demonstrate the application of COVAS in a medical classification setting.
2.2.2. FIFA 2018 Man of the Match (MotM) Dataset
The FIFA 2018 World Cup match statistics dataset from Kaggle [
20] contains information on 64 matches, including a total of 26 features describing team performance (e.g., goals, shots, ball possession, fouls), player performance (e.g., goals, assists, cards), and the label “Man of the Match”. It also includes advanced match statistics such as shots on target, corner kicks, and yellow/red cards. The dataset supports detailed analyzes of individual and team performance and enables statistical exploration of factors influencing match outcomes and Man of the Match selections. It was selected because its variables can be readily interpreted using SHAP, thereby facilitating an intuitive demonstration of how COVAS utilizes SHAP values for outlier detection in the sports domain.
2.3. Neural Network Architecture
A compact feedforward neural network was used to demonstrate the COVAS method rather than to optimize predictive performance for a specific dataset. The network consists of three hidden layers with 64, 32, and 32 neurons, respectively, each using rectified linear unit (ReLU) activation. The output layer contains a single neuron with sigmoid activation for binary classification.
Training was performed using the Adam optimizer and binary cross-entropy loss for ten epochs with a batch size of 16 on the standardized input features. No hyperparameter tuning or architecture search was conducted. A simple model was deliberately chosen to facilitate the analysis of the decision-making process. Other architectures or parameter settings might achieve higher accuracy, but performance optimization is outside the scope of this work.
While the core experiments are conducted using a feedforward neural network, COVAS is model-agnostic by construction through SHAP explanations. An additional validation using a tree-based model is provided in the
Appendix B.
2.4. SHAP Implementation
SHAP values were computed using the
shap library [
6]. First, an appropriate model explainer was selected and fitted on the scaled training data and the trained neural network. The SHAP values and the corresponding base values were then extracted, where the base value represents the average model prediction. For the binary classification tasks considered here, there is only one base value.
SHAP values quantify the contribution of each feature to the prediction for a specific instance based on cooperative game theory [
6]. They indicate how much each feature adds to or subtracts from the base value to yield the final prediction, thereby making the model output more interpretable. For visualization, decision plots were generated using the built-in SHAP functions.
For outlier identification, only correctly classified instances were considered. Model outputs ranged from 0 to 1; values greater than or equal to 0.5 were interpreted as predictions for class label 1, and values below 0.5 as predictions for class label 0. SHAP values, base values, and instance IDs were stored in a class-keyed dictionary based on the indices of the original data. These data structures served as the basis for the subsequent COVAS computation and visualization.
2.5. COVAS Framework
COVAS provides a systematic procedure for detecting anomalous instances by combining SHAP-based explanations with class-specific statistical descriptors of feature contributions. For each class, the distribution of SHAP values is analyzed by computing the mean and standard deviation of each feature’s SHAP values. These statistics are stored and later used to quantify the degree to which a given instance deviates from typical class behavior. Since COVAS operates in the explanation space, in principle, it can be applied to any predictive model for which SHAP explanations can be computed.
We consider two modes for applying COVAS:
Continuous mode: The strength of an outlier is quantified by the absolute deviation from the mean, measured in units of standard deviations.
Threshold mode: A binary matrix is derived, indicating whether the deviation of a feature exceeds a predefined standard deviation threshold.
Before constructing the COVA (Classification Outlier Variability) matrix
, all SHAP values are
z-transformed. Because SHAP values are feature-specific in scale and therefore not directly comparable across features, this normalization renders the deviations dimensionless and enables contributions from different features to be evaluated on a common scale. Furthermore, COVAS is designed to quantify atypicality in explanation patterns rather than the direction of individual feature contributions. Accordingly, we consider the absolute values of the standardized SHAP deviations, treating unusually strong positive and negative contributions as equally indicative of atypical behavior. The resulting matrix
contains
N instances and
M features, with each entry defined as
where
and
denote the mean and standard deviation of the SHAP values associated with feature
for the corresponding class. The term
refers to the
n-th instance. The absolute value reflects the magnitude of the deviation irrespective of direction and thus captures the overall “outlier strength” of each feature for a given instance.
The continuous COVA matrix
is defined as
To compute class-specific COVAS representations, the SHAP values corresponding to each class are used. Since the SHAP values are stored as matrices, feature names are assigned to each column to ensure correct alignment with their respective statistical distributions.
A scalar COVA score for each instance is obtained by averaging the absolute
z-scores across all features:
This average score reflects the overall degree of abnormality of an instance relative to the class-specific feature distributions.
It is important to emphasize that the aim of COVAS is not to outperform existing anomaly detection algorithms in terms of accuracy. COVAS quantifies deviations in the explanation space rather than in the raw feature space. This distinction reflects its role as a model interpretability tool: the goal is to surface unexpected, meaningful explanation patterns that warrant closer inspection and may support hypothesis formation, rather than to flag erroneous or noisy samples.
2.5.1. Continuous Mode
In the continuous mode, the
z-transformed SHAP values of each instance (Equation (
1)) are stored in the class-specific matrix
. Each entry represents the standardized deviation of a feature’s SHAP value from its expected distribution. A scalar outlier score for an instance is then obtained by averaging its absolute
z-scores across all features (Equation (
2)). This continuous score quantifies the overall atypicality of an instance concerning its class-specific SHAP behavior.
2.5.2. Threshold Mode
In the threshold mode, the COVA matrix
is constructed by applying a fixed cut-off to the
z-transformed SHAP values. For each instance–feature pair, a binary value is assigned based on whether the absolute
z-score exceeds a predefined threshold
:
The resulting matrix contains a value of 1 if the SHAP value of a feature for a given instance lies outside the threshold range, and a value of 0 if it lies within the range. The threshold
is typically set based on the desired sensitivity to deviations (e.g.,
for values beyond two standard deviations). The final COVA score for each instance is computed as the mean of the binary indicators across all features, as in Equation (
2), and reflects the proportion of features for which the instance is considered an outlier. In addition to the default setting
, a sensitivity analysis across multiple
values was performed to assess the robustness of the proposed framework (see
Section 3.3.1).
2.6. COVAS Workflow
To provide a clearer overview of the COVAS workflow,
Figure 1 illustrates the main steps of the COVAS computation pipeline. The corresponding computational procedure is formalized in Algorithm 1, which summarizes the individual processing steps in a concise and reproducible manner.
| Algorithm 1 COVAS Computation Framework |
| Require: Feature set X, model predictions y, SHAP explainer S |
| Ensure: Outlier score vector |
| 1: | Compute SHAP values for correctly classified instances: |
| 2: | for each class c do |
| 3: | Extract SHAP values for class: |
| 4: | Compute , |
| 5: | end for |
| 6: | for each instance i do |
| 7: | Compute deviation vector for class c: |
| 8: | Compute outlier score: |
| 9: | end for |
| 10: | return |
2.7. Custom SHAP Decision Plots
SHAP decision plots were used to visualize how individual features contribute to the model output for correctly classified instances within each class, as illustrated in
Figure 2. The
y-axis lists input features ordered by their average importance, and the
x-axis shows the model output (e.g., class probability) progressing from the base value (gray vertical line) toward the final prediction. Each colored line represents a single instance and traces how feature contributions accumulate.
A custom extension of the standard SHAP decision plot includes the mean SHAP path (solid black line) and the standard deviation bounds (dashed green lines), highlighting the central tendency and variability of feature contributions across instances. These additions contextualize individual paths with respect to overall model behavior and facilitate the identification of outliers and atypical decision paths. This visualization format is used consistently in the subsequent figures to interpret model behavior across different classes.
3. Results
3.1. Classification Performance
For the presented experiments, the full COVAS analysis, including SHAP computation, completed in under one minute per dataset. On the breast cancer (BC) dataset, the model achieved a test accuracy of 97.08%, resulting in 64 correctly classified malignant and 102 correctly classified benign instances.
On the Man of the Match (MotM) dataset, the model reached a test accuracy of 66.67%, with 15 correctly classified instances for the Not MotM class and 11 for the MotM class.
3.2. COVAS Case Studies
COVAS is used to analyze deviations in SHAP-based explanation patterns among correctly classified instances. We report class-wise COVAS scores and SHAP decision plots for both datasets; full COVA matrices are omitted due to size and are available in the public repository (see Code and Data Availability).
3.2.1. Results on the Breast Cancer (BC) Dataset
Figure 2 and
Figure 3 visualize class-specific SHAP decision paths for correctly classified instances. The malignant class shows higher dispersion around the mean SHAP path, with several instances deviating noticeably for features such as
worst concave points and
mean perimeter. In contrast, benign instances cluster more tightly around the mean path, indicating more homogeneous explanation patterns within this class.
Table 1 reports representative COVAS scores for both classes and highlights instances with atypical explanation patterns despite correct classification.
Instance-Level Illustration
To illustrate how COVAS supports the analysis of individual samples, we consider a representative instance with a high COVAS score from the malignant class of the breast cancer dataset. As shown in
Table 1, this instance (patient 3) is correctly classified but exhibits a markedly atypical explanation pattern, reflected by its elevated
score.
Figure 4 visualizes the SHAP-based decision paths for all correctly classified malignant instances, with patient 3 highlighted. While most instances follow a compact, class-specific mean trajectory, the highlighted instance shows pronounced deviations from the mean SHAP path across several high-impact features, including
mean perimeter and
worst area.
This example demonstrates how COVAS enables the systematic identification of atypical yet correctly classified instances and supports targeted, instance-level inspection of explanation patterns. Such cases may serve as starting points for further domain-specific investigation or hypothesis formulation by experts, without implying any clinical conclusions.
3.2.2. Results on the Man of the Match (MotM) Dataset
For the MotM dataset, the SHAP decision plots (
Figure 5 and
Figure 6) show larger within-class variability than in the BC case study, particularly for the MotM class. This indicates that correctly classified matches can still exhibit diverse attribution patterns across the available match statistics.
Table 2 lists representative high- and low-scoring matches per class, illustrating that COVAS distinguishes typical from atypical explanation patterns within both MotM and Not MotM.
3.3. Robustness Analyzes
To strengthen the reliability of the empirical findings, we conducted robustness analyzes evaluating the sensitivity of COVAS to key design choices and parameter settings. The following subsection focuses on the threshold parameter used in threshold-mode COVAS.
3.3.1. Threshold Sensitivity
We evaluated the robustness of threshold-mode COVAS across
on both datasets. As summarized in
Table 3, increasing
leads to a smooth decrease in the mean and standard deviation of COVAS scores, reflecting increasingly strict deviation criteria.
The stability of the most atypical instances was assessed using the top-10 overlap between consecutive
values (
Table 4). Overlaps are generally high (0.7–1.0), indicating stable identification of atypical instances under moderate threshold variations. A lower overlap (0.40) is observed for the malignant class of the BC dataset when increasing
from 2.5 to 3.0, which reflects the conservative nature of very high thresholds that suppress borderline deviations rather than an instability of the method. For the MotM dataset at
, no feature-level deviations exceed the threshold, resulting in zero COVAS scores and further illustrating the strictness of large threshold values.
3.3.2. Ablation Study
We tested the sensitivity of COVAS to key design choices, including (i) using raw instead of z-standardized SHAP values, and (ii) using global instead of class-conditional feature distributions. Across both datasets, these variants produced identical top-10 selections and near-perfect rank agreement, indicating that the identification of atypical instances is not driven by a specific normalization choice in the studied settings.
3.4. Comparison with Feature-Space Methods
We compared COVAS to Local Outlier Factor (LOF) and a feature-wise
z-score baseline by selecting the top-10 outliers among correctly classified instances. For both datasets, the overlap between the COVAS top-10 and the feature-space baselines is zero (
Table 5), indicating that COVAS highlights instances that are not classical feature-space outliers. Consistently, LOF and feature-wise
z-scores identify samples with higher feature-space extremeness, whereas COVAS outliers show lower mean absolute feature
z-scores.
3.5. Impact of Class Imbalance
To assess the impact of class imbalance, we retrained the neural network on the Breast Cancer dataset using a class-weighted loss while keeping the test set unchanged. Because the set of correctly classified instances differs between the balanced and unbalanced models, all comparisons were restricted to the intersection of samples correctly classified in both settings and evaluated separately for each class.
As shown in
Table 6, class weighting induces only minor shifts in the COVAS score distributions for both classes. For benign instances (
), the mean COVAS score increases slightly from 0.62 to 0.66 with a marginal reduction in variability, while for malignant instances (
), the mean increases from 0.73 to 0.74 with a small increase in standard deviation.
Importantly, the relative ranking of instances remains largely preserved. Spearman rank correlations between balanced and unbalanced COVAS scores are high for both benign () and malignant () classes, and the overlap among the top-10 most atypical samples remains substantial (0.8 and 0.7, respectively). Overall, these results suggest that class imbalance has a limited effect on explanation-space outlier identification in the evaluated setting.
4. Discussion
4.1. Network Performance
The neural network achieved a final accuracy of 66.67% on the MotM dataset. It was trained on a total of 89 instances and evaluated on 39 test instances. In contrast, on the breast cancer (BC) dataset, the model reached an accuracy of 97.08%, based on 398 training instances and 171 test instances.
The substantially higher performance on the BC dataset suggests that either the larger amount of data or the more clearly separable class structure supported more stable learning, whereas the smaller and noisier MotM dataset limited the achievable accuracy. Since the aim of this study was to provide a proof of concept for COVAS rather than to develop a highly optimized classifier for each dataset, we deliberately used the same simple network architecture for both tasks to ensure comparability and transparency of the decision process.
4.2. Interpretation of COVAS Results
To clarify the relationship between COVAS scores and predictive uncertainty, we provide an additional analysis in
Appendix C, showing that COVAS captures aspects of model behavior that are largely independent of standard uncertainty measures.
COVAS is computed on correctly classified instances by design, as its goal is to identify atypical explanation patterns within the set of successful model decisions. This avoids conflating explanation-space deviations with misclassification effects and ensures that detected outliers reflect unusual but still valid decision paths. A consequence is that the number of analyzable instances depends on predictive accuracy; when accuracy is lower, fewer test samples remain for COVAS analysis, which we report explicitly and consider a practical limitation.
4.2.1. Breast Cancer Dataset
As shown in
Table 1, patient 3 exhibits the highest average deviation from the mean classification path, with a COVA score of 1.994. This instance is therefore a promising candidate for further investigation. A closer analysis of the features that contributed most strongly to the classification, particularly those falling outside the
standard deviation bounds in the SHAP decision plots (
Figure 2 and
Figure 3), could support the formulation of new hypotheses for breast cancer research.
By systematically highlighting such borderline cases, COVAS may contribute to a more comprehensive understanding of breast cancer characteristics [
8]. Rather than focusing solely on typical presentations, COVAS directs attention towards atypical but correctly classified instances, which may reveal rare constellations of features or alternative decision pathways that remain invisible in aggregate analyses [
2].
4.2.2. Man of the Match Dataset
For the MotM dataset,
Table 2 shows that the selection of the Man of the Match in Sweden’s game on 3 July 2018 exhibits the greatest deviation from the average MotM decision. This match represents a strong outlier in terms of feature attributions and is therefore a suitable candidate for more detailed analysis. In this context, COVAS can support the formulation of new hypotheses about the relevance of individual performance statistics in player evaluation.
By examining such borderline games, where the statistical profile of the MotM differs markedly from the typical pattern, COVAS may help to uncover alternative performance constellations that lead to recognition; for example, roles that are less focused on scoring but highly influential in other metrics. This could advance research in performance analysis and support more nuanced discussions about what constitutes an “outstanding” performance in football.
4.3. Potential Applications of COVAS
COVAS is a versatile algorithm that can be applied across multiple domains. Each domain offers distinct opportunities for leveraging outlier-aware explanations to generate new insights. Below, we outline three areas in which COVAS shows particular potential.
4.3.1. Medical Domain
In the medical domain, COVAS can be used to focus on unique or specific symptom patterns in patients. In
Figure 3, for example, there is at least one instance that exceeds the
standard deviation band yet ends around the mean model output. This instance is clearly classified differently from the average path, but the patient is still correctly classified as benign.
Such cases may motivate new hypotheses; for instance, high values of features such as worst radius do not consistently indicate malignancy in breast cancer. More generally, COVAS has the potential to yield new insights into disease manifestations by emphasizing atypical but valid clinical presentations. By focusing on abnormal patient profiles that are nonetheless correctly classified, COVAS may facilitate the identification of additional symptoms, variant patterns, or subgroups, thereby supporting diagnosis across a broader patient spectrum.
4.3.2. Sports Domain
A further potential application lies in athlete performance and movement analysis. A common misconception in sports is that a single, idealized movement pattern exists and should be replicated by all athletes [
22]. In contrast, modern approaches such as nonlinear pedagogy encourage the exploration of individualized, functional movement solutions [
23].
With COVAS, movement patterns can be analyzed in the context of performance outcomes; for example, the run-up and jump phases in the high jump in relation to the vaulted height. Athletes achieving exceptional results, such as world-class performances, may exhibit movement patterns that systematically differ from the average. A comprehensive COVAS analysis could identify key movement components in such outlier athletes and inform new training approaches. Similarly, this concept could be applied to general movement analysis in rehabilitation or talent development.
4.3.3. COVAS for Model Enhancement
Beyond domain-specific interpretation, COVAS also offers potential for model development in ML. By focusing on edge cases, it can reveal previously overlooked aspects of the dataset and highlight regions of the input space where model behavior differs from the majority. Because of its model-agnostic design through the use of SHAP values, COVAS can, in principle, be applied to a wide variety of ML models, including tree-based methods, neural networks, and linear models. For linear models, SHAP attributions are closely related to weighted feature values, and explanation patterns may therefore reduce to simpler statistical deviations that can be computed directly from the data. In this sense, COVAS can be viewed as a generalization that becomes particularly informative for non-linear models, where explanation patterns are no longer trivially linked to the input space.
When examining a confusion matrix, COVAS can be used to analyze false positives (FP) and false negatives (FN) in more detail by identifying which misclassified instances are particularly atypical compared with the average FP or FN case. Combined with SHAP, this enables a feature-level decomposition of why the model failed for specific edge cases. Such insights can guide architectural modifications (e.g., additional layers, regularization, alternative activation functions) or data-centric interventions (e.g., targeted data collection or augmentation) to improve model robustness and fairness [
2].
Crucially, COVAS is not intended as a competitive outlier detection method. Its strength lies in identifying explanation-level outliers—instances whose SHAP trajectories diverge from class-typical decision paths while still being classified correctly. Such borderline cases are often the most informative for domain experts, as they may reveal alternative mechanisms, rare manifestations, or overlooked feature interactions. Consequently, COVAS functions primarily as a tool for explanation-based hypothesis generation rather than as a diagnostic or anomaly-screening system.
4.4. Limitations and Future Directions
In the current formulation, COVAS depends on SHAP values as input, and only through this does it gain its model- and data-agnostic characteristics. The current formulation implicitly assumes that standardized SHAP value distributions provide a meaningful measure of deviation; this assumption may be less appropriate in domains with highly skewed or multi-modal attribution distributions. In addition, the present study focuses on correctly classified instances and therefore does not directly address model calibration or misclassification patterns. Extending COVAS to jointly analyze correctly and incorrectly classified cases could provide a richer perspective on model behavior.
While the computational effort required to obtain SHAP values was moderate for the tabular case studies analyzed here, extending the evaluation to high-dimensional data or temporally ordered inputs such as time series or sequence models would substantially increase computational demands. As a consequence, obtaining explanations in real-time or low-latency application scenarios may not always be feasible. Prior work has demonstrated that applying SHAP-based explanations to temporal models requires specialized extensions, such as TimeSHAP, which computes feature-, timestep-, and cell-level attributions for ordered inputs [
24], or WindowSHAP, which improves efficiency by partitioning sequences into time windows [
25]. These approaches highlight both the feasibility and the increased computational cost of explainability in temporal domains, which was beyond the scope of the present study.
A seed stability analysis further indicates that the identified explanation-space outliers remain consistent across different random initializations, suggesting that the reported patterns are not driven by stochastic effects associated with model initialization or network size (see
Appendix C (Seed Stability Analysis on the MotM Dataset)).
Future work includes extending the empirical evaluation of COVAS to multiclass and non-binary classification settings, as well as to additional model classes and data domains. While SHAP was chosen due to its wide use and model-agnostic properties, the general idea of COVAS could also be applied to other explainability methods with similar characteristics, such as Integrated Gradients. Furthermore, investigating the integration of COVAS into interactive analysis workflows for domain experts represents an important direction for future research.
We emphasize that the presented examples are intended to illustrate the type of deviations identified by COVAS rather than to provide validated clinical or domain-specific conclusions. Assessing the practical relevance of such deviations requires expert evaluation and is beyond the scope of this work.
5. Conclusions
In this study, we introduced COVAS, a framework designed to highlight atypical but correctly classified instances by leveraging SHAP-based explanations, thus gaining a model- and data-agnostic characteristic. By quantifying the deviation of individual instances from class-specific SHAP distributions, COVAS provides a structured approach for identifying cases that follow uncommon decision paths and therefore represent candidates for deeper investigation. Across two distinct domains—medical diagnosis and sports analytics—COVAS demonstrated its potential to reveal instance-level variability that is not captured by standard performance metrics or aggregate interpretability methods. These findings suggest that COVAS can support hypothesis generation, enhance model inspection, and complement existing XAI techniques by directing attention toward informative outliers.
The core contribution of COVAS lies in using explanation-based deviations to support scientific hypothesis generation rather than in detecting mislabeled or erroneous data points.
Future work includes extending COVAS to additional data modalities, such as sequential or high-dimensional settings, where explanation methods pose additional computational challenges. Moreover, the current formulation focuses on correctly classified instances and does not directly address misclassification patterns or model calibration.
Looking ahead, the broader impact of COVAS will depend on its application across a wider range of datasets, model architectures, and domains. We strongly encourage other researchers to apply and evaluate COVAS within their respective fields to further test, refine, and extend the method. Future work should investigate alternative attribution backends, assess sensitivity across model types, and explore adaptations of COVAS for more complex data modalities. Such efforts will help establish the generality of COVAS and support its integration into practical machine learning workflows.