Next Article in Journal
Deepfake Detection Using Multimodal CLIP-Based SigLIP-2 Vision Transformers
Previous Article in Journal
A Real-Time Laryngeal Disease Diagnosis Algorithm on Edge-AI
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Trust Triangle: A Reliability-Validity-Generation Framework for Explainable Credit Card Fraud Detection with RAG-Enhanced LLMs Reasoning

1
College of Artificial Intelligence, National Yang Ming Chiao Tung University, No. 301, Sec. 2, Gaofa 3rd Rd., Guiren Dist., Tainan City 711, Taiwan
2
Department of Project Management and Industrial Engineering, Shandong University, 27 Shanda Nanlu, Jinan 250100, China
*
Author to whom correspondence should be addressed.
AI 2026, 7(3), 114; https://doi.org/10.3390/ai7030114
Submission received: 8 February 2026 / Revised: 4 March 2026 / Accepted: 5 March 2026 / Published: 19 March 2026
(This article belongs to the Section AI Systems: Theory and Applications)

Abstract

We propose Trust Triangle, a Bridging Methodology that establishes evidential reliability through multi-attribution consensus, ensures external validity via statistical hypothesis testing, and enables controlled generation with RAG-anchored LLMs to transform black-box predictions into trustworthy, auditable explanations. This framework is instantiated for credit card fraud detection by integrating multi-method feature attributions with rigorous statistical validation. The resulting reliability-validity-verified insights are synthesized with high-relevance domain knowledge (relevance score > 0.7) retrieved from a real-world corpus via Retrieval-Augmented Generation (RAG). A structured Chain-of-Thought (CoT) prompt then guides an LLM to produce coherent, audit-ready case reports. Our contributions are threefold: (1) a verifiable framework for quantifying attribution reliability and validity, (2) a demonstrated end-to-end pipeline from robust prediction to semantically grounded explanation, and (3) a generalizable paradigm for Trustworthy ML in high-stakes domains. Experiments on a highly imbalanced dataset (fraud rate: 8.74%) demonstrate robust performance (PR-AUC = 0.7867), successfully identify statistically significant predictive features, and generate audit-ready reports, thereby advancing a rigorous, evidence-based pathway from model output to decision-ready support.

1. Introduction

Machine learning (ML) models have demonstrated strong efficacy in critical tasks such as credit card fraud detection. However, their “Prediction black box” nature lacks transparency in decision-making, severely eroding user trust and hindering deployment in practice [1]. Existing approaches to explainability face a dual challenge. First, mainstream anomaly detection models, such as Bayesian Autoencoders (BAEs), focus on predictive performance but often fail to provide quantitative uncertainty estimates and statistically validated reasoning for their decisions [2,3]. Second, post hoc feature attribution frequently relies on a single method, yielding results that lack reliability (internal consistency across methods) and validity (empirical association with ground-truth labels), effectively creating an “Explanation black-box” [4].
To systematically address these limitations, we propose a novel “Trust Triangle” framework and demonstrate its application in credit card fraud detection. The framework, illustrated in Figure 1, establishes a trustworthy explanation pipeline by structurally bridging three pillars: evidential reliability, external validity, and controlled generation.
The pipeline begins at the Machine Learning Module (ML) by enhancing a BAE for robust and uncertainty-aware fraud identification based on reconstruction errors (REs), utilizing Monte Carlo Dropout [5] and Huber loss, which outputs: feature-level RE, instance-level RE, and optimal threshold. This is followed by the reliability–validity Bridging Module (B), whose core innovation lies in a parallel processing stage that operationalizes our two pillars: (1) establishing evidential reliability through multi-attribution consensus of feature importance, integrating three theoretically distinct methods—Integrated Gradients [6], SHAP [7], and Feature Perturbation [1]; (2) grounding decisions with external validity via rigorous reliability and validity testing, employing statistical validation (e.g., Mann–Whitney U test with FDR correction [8]) against actual fraud outcomes. After aggregating the results of reliability and validity evaluations, the output of the Bridging Module comprises two primary indicators: Feature Importance Score and Feature Confidence Score. These indicators are jointly integrated with the Optimized Threshold and reconstruction errors at both the feature-level and instance-level produced by the Machine Learning Module and subsequently fed into the LLM module to enable controllable generation. We adopt a Retrieval-Augmented Generation (RAG) [9] with domain knowledge to explicitly restrict the generation context of the LLM, enhancing controllability and mitigating hallucination. The synthesized evidence from these streams is then structured via a Chain-of-Thought (CoT) prompt [10] to guide a Large Language Model (LLM) [11] in producing an auditable, semantic report. This end-to-end process transforms an opaque anomaly probability into a transparent, evidence-anchored insight.
Crucially, the Trust Triangle is designed as a generalizable architectural blueprint. To adapt it to a new domain (e.g., healthcare diagnostics or judicial risk assessment), the core process remains unchanged: practitioners would (1) supply the relevant quantitative data and train a suitable predictive model; (2) curate a domain-specific knowledge base (e.g., medical journals, legal statutes); (3) define corresponding rule templates and semantic descriptors (e.g., symptom triggers, legal precedent descriptions); and (4) customize the prompt template by substituting domain-specific variables (e.g., feature names, reconstruction errors, impact scores, and retrieved knowledge) to ensure the final explanation remains coherent and factually anchored [10]. The RAG pipeline [9] then retrieves high-quality context from this new knowledge base using the defined keywords, and a structured CoT prompt [10] integrates this with the model’s validated evidence to guide the LLM [11] in generating rigorous, domain-appropriate reports. This demonstrates the framework’s extensibility beyond the fraud detection exemplar detailed herein. Notably, the modular design acknowledges the importance of cost-efficient LLM deployment strategies [12] and is compatible with principles of interpretable representation learning from other modalities [13].
Our contributions are threefold: (1) We propose the Trust Triangle, a verifiable bridging framework that sets quantitative standards for assessing the reliability and validity of feature attributions. (2) We construct and demonstrate an end-to-end instantiation for credit card fraud detection, from robust prediction to trustworthy explanation, anchoring LLMs [11] in rigorous statistical evidence. (3) We provide a generalizable methodological paradigm that enhances transparency and auditability for Trustworthy ML in high-risk domains, complete with a clear pathway for adaptation to other fields.

2. Related Works

2.1. Challenges in Trustworthy Anomaly Explanation

Building trustworthy machine learning for high-stakes domains like fraud detection requires reconciling three pillars, external validity, evidential reliability, and controlled generation, each with significant limitations. First, while Bayesian Autoencoders (BAEs) provide principled uncertainty estimation for anomaly detection by combining the generative modeling capabilities of variational autoencoders [2] with the flexibility of deep Gaussian mixture models for capturing complex data distributions [3], and further enhanced by Monte Carlo Dropout to approximate Bayesian inference in deep neural networks [5], they remain “black-box predictors”, outputting a scalar score without transparent justification for why a transaction is flagged. Second, post hoc attribution methods—such as gradient-based (Integrated Gradients [6]), game-theoretic (SHAP [7]), and perturbation-based approaches [1]—offer insights but produce inconsistent results when used in isolation. Reliance on any single method lacks both reliability (cross-method agreement) and validity (statistical grounding), creating an “explanation black box” [4,14]. Third, while LLMs [11,15] excel at fluent generation, deploying them for untethered explanation risks “hallucinations” and misalignment with the underlying model’s logic [16], undermining trust in critical applications.

2.2. The Trust Triangle Framework

To address these gaps, we introduce the Trust Triangle Framework. It integrates a robust BAE detector, a multi-attribution consensus module with statistical validation, and a RAG [9] system. This framework systematically bridges the quantitative evidence from the ML model with domain knowledge, ensuring that the final explanations are reliable (via method consensus), valid (via statistical testing), and semantically grounded (via controlled generation). It transforms opaque predictions into auditable, decision-ready insights.
High-stakes applications like fraud detection demand AI systems that are not only accurate but also trustworthy—their predictions must come with comprehensible, evidence-based explanations. A fundamental tension exists between the dominant paradigms capable of delivering these components. On one hand, complex ML models excel at deriving quantitative predictions from raw data but often function as inscrutable “black boxes,” providing limited intuitive justification for their outputs [1,17]. On the other hand, LLMs demonstrate powerful semantic reasoning and fluent generation [11,18], capabilities fundamentally enabled by attention mechanisms [15], yet they lack grounded, quantitative judgment and are prone to “hallucinating” plausible but unsubstantiated assertions [16].
A naive integration—feeding an ML model’s raw output to an LLM for post hoc narration—fails to resolve this tension and can exacerbate it. The uncertainty inherent in the ML prediction can be unwittingly amplified by the LLM’s generation process, risking explanations that are semantically fluent but statistically ungrounded, thereby eroding trust [4].
To achieve truly trustworthy and actionable interpretations, we must construct a principled bridge that transforms the ML model’s evidence into a verified, structured form before it is articulated by the LLM. This is the purpose of our Trust Triangle, built upon three interdependent pillars:
1. Evidential Reliability: To combat the opacity and method-specific bias of post hoc explanations [4,19], we transform the ML model’s internal state into consensus-verified feature attributions. This involves aggregating multiple theoretically distinct attribution methods—each with a unique axiomatic foundation: Integrated Gradients [6] satisfy sensitivity and implementation invariance axioms; SHAP [7] provides a unified framework based on cooperative game theory with desirable fairness properties; and Feature Perturbation [1] offers an intuitive, model-agnostic approach that directly measures the impact of input changes on model output and rigorously measuring their agreement, ensuring derived importance is robust and not an artifact of a single technique [20].
2. External Validity: To ensure explanations are relevant to the real-world task, attributed feature importance must be statistically anchored to actual outcomes. This requires rigorous hypothesis testing against ground truth [21] and the evaluation of practical effect sizes, moving beyond visual saliency to establish a defensible link between model evidence and business impact [8].
3. Controlled, Grounded Generation: To mitigate LLM hallucination [16], the generative process must be constrained and enriched. This is achieved by providing the LLM with structured prompts that integrate the validated quantitative scores with retrieved context from authoritative, domain-specific knowledge bases [9], guided by chain-of-thought reasoning [10].
Our framework instantiates this Trust Triangle: The Bridging Module (B) fulfills the first two pillars (evidential reliability and external validity), producing rigorously verified quantitative evidence. This evidence then directs the LLM module (LLM) to execute the third pillar (controlled generation). The final explanation is thus a traceable synthesis of reliability-tested evidence and validity-anchored knowledge—a closed loop from quantitative verification to trustworthy semantic articulation.

2.3. Dataset Characteristics

We employ a real-world credit card fraud dataset [22] of 1 M transactions with severe class imbalance (labels include “normal” and “fraud”, fraud rate: 8.74%). It provides seven domain-relevant raw features: two geographical (distance from home, from last transaction), three behavioral (transaction amount ratio, same merchant flag, online order flag), and two security-related (chip card usage, PIN verification). The large sample ensures statistical robustness for reliability-validity analysis, while the semantically rich feature space facilitates the RAG process, enabling the precise grounding of generated explanations in relevant domain knowledge [9].

2.4. Comparison with Existing Approaches: The Need for a Bridging Framework

While the methods reviewed above have advanced the field of explainable AI, each exhibits inherent limitations that our Trust Triangle framework systematically addresses. Table 1 summarizes the key comparisons.
Key Insights from Comparison:
  • Beyond Single-Method Attribution: While LIME [1], Integrated Gradients [6], and SHAP [7] each provide valuable perspectives, reliance on any single method risks specific bias [4,19]. Our framework transforms attribution from a point estimate into a consensus-verified distribution across methods, with explicit consistency metrics.
  • From Statistical Significance to Practical Validity: Traditional hypothesis testing [21,23] identifies statistically significant differences, but without effect size [8] or multiple testing correction, features may be declared “important” despite negligible practical impact. Our external validity pillar integrates p-values, FDR correction [8], and effect sizes into a composite validity weight.
  • Uncertainty-Aware Detection: Variational autoencoders [2] and their extensions [3,26] excel at learning normal patterns, but their point estimates ignore epistemic uncertainty. By incorporating Monte Carlo Dropout [5] and bootstrap validation [24], our BAE backbone provides calibrated uncertainty estimates essential for reliable threshold selection.
  • Grounded Generation with RAG: While LLMs demonstrate remarkable fluency [11,18], their tendency to hallucinate [16] makes them unsuitable for direct explanation of high-stakes predictions. Our RAG pipeline [9,25] retrieves authoritative context, and CoT prompting [10] ensures that generated narratives remain faithful to the verified evidence.
  • The Missing Bridge: Existing work either stops at explanation (post hoc methods) or generation (LLMs), but none systematically bridges quantitative verification with semantic articulation. The Trust Triangle fills this gap by introducing a dedicated Bridging Module that transforms raw model outputs into validated evidence before generation—a distinction that is both novel and essential for trustworthy AI in high-risk domains.
This comparative analysis demonstrates that our framework is not merely an aggregation of existing techniques but a principled integration that addresses their individual weaknesses while preserving their strengths. The result is an end-to-end pipeline that meets the rigorous demands of real-world fraud detection: reliable, valid, and auditable explanations.

3. Method

3.1. A Robust Predictive Backbone for Imbalanced Fraud Detection

Our framework begins with a robust, uncertainty-aware predictive backbone. The pipeline operates in two stages to ensure reliability: Unsupervised Pre-training (UPT, Stage 1) followed by Supervised Fine-tuning (SFT, Stage 2). The core detector is a BAE [2], trained on a real-world credit card transaction dataset of 1 million instances (fraud rate: 8.74%) with 7 domain-specific features defined by foundational principles [23]. The encoder  f θ e : R 7 R 16  and decoder  g θ d  form symmetrically fully connected blocks (64-48-32-24-16-24-32-48-64) with ReLU, batch normalization, and dropout.

3.1.1. Two-Stage BAE Training

Stage 1: Unsupervised Pre-training. We pre-train the BAE using only normal transactions ( X N ), e.g., 90% of normal transactions while explicitly excluding fraudulent ones ( X F ). A key enhancement is uncertainty quantification via Monte Carlo Dropout at inference [5], which yields a posterior distribution over the reconstruction error for each feature, providing mean  E [ RE ]  and variance  V [ RE ] . For robust optimization, we employ Huber loss instead of MSE to mitigate the influence of outlier reconstruction errors during training [27]. The model’s fraud detection capability (Performance) is then evaluated, and its reliability is rigorously quantified by applying bootstrap resampling (200 iterations) to estimate confidence intervals (CI) for performance metrics (e.g., PR-AUC).
Stage 2: Supervised Fine-tuning. The pre-trained BAE is fine-tuned on a mixed dataset after class imbalance mitigation (e.g., containing 10% of normal transactions and 100% of fraudulent ones). Crucially, we apply the Mann–Whitney U test to verify that the training and validation sets are drawn from statistically equivalent distributions, mitigating data leakage risks [28]. From this stage, we obtain feature-wise reconstruction errors for both normal and fraudulent transactions. An instance-level optimal decision threshold is selected via bootstrap resampling (200 iterations), whose reliability is likewise validated through this process.

3.1.2. BAE Performance Evaluation

Model performance focuses on the fraud class. We report Precision–Recall AUC (PR-AUC)—appropriate for imbalanced data [29]—and the maximum F1-score ( F 1 m a x ). A composite score  S comb = 0.7 PR - AUC + 0.3 F 1 m a x  [29] balances ranking quality with threshold-specific performance, aligning with the business goal of maximizing fraud discovery while controlling false alarms. This backbone provides not only accurate anomaly scores but also a quantifiable measure of uncertainty, forming the first pillar of our Trust Triangle.

3.1.3. Workflow Description of Evidential Reliability

Figure 2 illustrates our machine learning training and deployment process, which enhances the reliability of BAE predictions and establishes the reliability of threshold determination based on instance-level reconstruction errors.
BAE Training: The module takes normal instance data as input (90% of  X N ), allowing the BAE to learn the underlying structure of normal transactions in Stage 1: UPT. Subsequently, mixed data, validated via the Mann–Whitney U test for distributional differences, is introduced to train the BAE’s capability to identify fraudulent patterns in Stage 2: SFT. On one hand, the performance of the BAE is validated through Bootstrap resampling (n = 200) to assess the stability of its fraud detection capability, providing users with trustworthy performance metrics. On the other hand, the BAE outputs the reconstruction error (RE) for each instance at both the feature-level and instance-level. We use Bootstrap validation (sample size n = 200) to examine the stability of the optimal threshold derived from the error distributions for distinguishing between normal and fraudulent transactions. Both instance-level and feature-level reconstruction errors, along with a statistically validated optimal threshold, are then passed to the subsequent Bridging Module to serve as the foundation for feature attribution.
BAE Deployment: We input 20 new unlabeled instances into the pre-trained BAE and compute both feature-level and instance-level reconstruction errors for each instance. Based on the instance-level reconstruction error, a risk score is calculated. For each feature, an impact score is derived from the standardized position (z-score) of its feature-level reconstruction error relative to the distribution of that feature’s reconstruction error across normal transactions. The instance’s risk score (risk score) and the corresponding feature impact scores (impact score) are then fed into an LLM Module, where they are mapped, respectively, to the instance’s risk level and the fraud types triggered based on the feature impact scores.

3.2. Bridging Evidence and Validity for Trustworthy Attributions

To transform opaque model predictions into trustworthy explanations, we introduce the Reliability-Validity Bridging Module. This component ingests instance-level and feature-level reconstruction errors and outputs two rigorously verified metrics per feature: a Feature Importance Score and a Feature Confidence Score. The framework performs two core, sequential tasks to establish this trust: first, it quantifies evidential reliability through multi-method consensus; second, it assesses external validity through statistical grounding in real outcomes.

3.2.1. Evidential Reliability via Multi-Method Consensus

We establish reliability by aggregating evidence from three theoretically distinct attribution methods applied in parallel: Integrated Gradients (IGs) for its axiomatic foundation [6], SHAP for its game-theoretic fairness [7], and Perturbation for its causal intuitiveness [1]. For feature j, the raw score  s j m  from method m is min-max-normalized to  s ~ j m . Reliability is quantified at two levels:
  • Micro-Consistency (Feature-Level): We compute consistencyj = 1 m i n ( 1.0,2 std ( { s ~ j m } m = 1 3 ) ) , measuring the agreement across methods for that specific feature.
  • Macro-Consensus (System-Level): We calculate global consistency  C global = (Spearman’s  ρ  + Kendall’s  τ )/2, as the mean of pairwise Spearman’s  ρ  and Kendall’s  τ  rank correlations between the three methods’ rankings [20,21];  C global  dynamically determines the fusion weight  w rel  for each method, favoring more stable methods (e.g., SHAP) when consensus is low.

3.2.2. External Validity via Statistical Association

We ground the attributions in empirical reality by testing their association with the target variable. For each feature j, we apply the Mann–Whitney U test [21] to the distributions of reconstruction error for positive (e.g., fraud) versus negative (e.g., normal) instances, obtaining a p-value. The Benjamini–Hochberg procedure [8] controls the False Discovery Rate (FDR), yielding corrected p-values  p corr j . We also compute the effect size  d j  (Cohen’s d). These metrics are fused into a composite validity weight:
w val j = 0.6 I ( p corr j < 0.05 ) + 0.4 m i n   ( 1,2 d j ) ,
where  I ( )  is the indicator function. This ensures features must be both statistically significant and practically meaningful to receive high validity weighting [8].

3.2.3. Fusion and Output

The final, verified Feature Importance Score for feature j is
Importance j = m = 1 3 w rel · s ~ j m · w val j k = 1 n m = 1 3 w rel · s ~ k m · w val k ,
where stability is confirmed via 200 bootstrap resamples. The Feature Confidence Score, a meta-evaluation of trustworthiness, is independently computed as
Confidence j = 0.4 consistency j + 0.4 w val j + 0.2 C global .

3.2.4. Workflow Description of External Validit

As shown in Figure 3, the Bridging Module (B) systematically processes the ML model’s output. First, extract feature-level reconstruction error to perform evidential reliability. It does this by executing the three attribution methods (IG, SHAP, Perturbation) in parallel, calculating both micro-consistency and macro-consensus [21,22], and deriving the adaptive reliability weight  w rel . This process ensures the internal robustness of the attributes generated.
Second, extract instance-level reconstruction error and the optimal threshold to perform external validity. This involves statistically testing the association between each feature’s reconstruction error and the ground-truth labels using the Mann–Whitney U test [21], applying FDR correction [8] for rigor, and validating the reliability of feature attributions through bootstrap resampling (200 iterations). This anchors the explanations in observable outcomes.
Finally, the results of these two tasks are consolidated. Reliability and validity weights are fused into feature attribution evidence to produce the final normalized Feature Importance Score. Concurrently, the Feature Confidence Score is computed as a separate, holistic trust indicator and is presented in Figure 3 to the user for reference. The output of the Bridge Module is thus a rigorously validated quantitative evidence set, which, along with Optimized Threshold and reconstruction errors at both the feature level and instance level, is then used to guide an LLM in a controlled generation pipeline (LLM Module) for producing faithful, natural-language explanations.

3.3. Controlled Generation for Actionable Explanations

The Bridge Module provides the verified quantitative evidence—features weighted by both evidential reliability (multi-method consensus) and external validity (statistical association). This section describes the final pillar of the Trust Triangle: controlled generation (LLM Module, LLM). We translate the static, validated scores into dynamic, domain-specific narratives by strictly controlling the LLM’s reasoning with this multi-source evidence.

3.3.1. Risk Quantification and Rule Mapping

We first operationalize the Bridge Module’s output into concrete, instance-specific risk metrics. For a new instance i, an anomaly score  e i  is derived from the detector (e.g., instance-level reconstruction error [2]). This is normalized to produce an instance risk score:
R i s k i =   e i / T
where T is a threshold from normal samples. Concurrently, we compute a feature impact score for feature j:
I m p a c t j i = w j i ( x j i μ j ) / σ j I m p o r t a n c e j
Here,  I m p o r t a n c e j  is the reliability- and validity-verified importance from the Bridge Module. The term  w j i  is the instance-specific attribution weight (e.g., from SHAP [7]), and for the reconstruction error  x j i  of feature  j , for instance,  i ( x j i μ j ) / σ j  means the standardized deviation captures the feature’s atypicality, where  μ j  denotes the mean reconstruction error of the feature j for normal transactions, and  σ j  denotes the corresponding standard deviation. A high  I m p a c t j i  indicates a feature that is both globally important and locally anomalous. These scores are mapped to predefined, interpretable fraud rule templates (e.g., “high-value transaction from a new country”), ensuring alerts are grounded in statistically verified feature importance [8].

3.3.2. Controlled Generation with RAG

To generate trustworthy, actionable narratives from the quantitative evidence ( R i s k i I m p a c t j i I m p o r t a n c e j ), we employ a RAG pipeline [9] for controlled generation. Upon crime trigger rules, the system uses the rules type and high-impact features as keys to retrieve relevant explanatory passages from a curated Domain Knowledge Base (DKB). A structured prompt is then constructed for an LLM (e.g., [11,18]), which integrates three core components: (1) the verified quantitative evidence (including FDR-corrected p-values), (2) the retrieved qualitative knowledge, and (3) explicit instructions for causal, evidence-anchored synthesis. This controlled input ensures that every claim in the generated explanation is explicitly constrained by and traceable to the Bridge Module’s validated data and domain literature [1], effectively mitigating hallucination and providing an auditable justification for the alert.

3.3.3. Workflow Description of Controlled, Grounded Generation

This workflow fulfills the ultimate objective of the trust triangle: enabling users to confidently trust the predictions of ML, as illustrated in Figure 4 and Table 2. First, these raw outputs of the Bridging Module (B), including reconstruction errors at both the feature-level and instance-level (testing set), optimal threshold, Feature Importance Score, and Feature Confidence Score, are fused into Feature Importance Fusion (Step ➊), while the data of a new instance are applied into the pre-trained BAE to compute the risk score and personalized feature impact scores of new instances based on its reconstruction errors (Steps ➋ and ➌).
The generation process begins by extracting canonical credit card fraud patterns from a domain knowledge base (Step 4). These patterns are encoded into an LLM Interpretation Guide, detailing seven major fraud types, their typical modus operandi, and five designated keywords per pattern. For a new instance, we match its feature impact scores against these patterns to identify triggered fraud types. The associated keywords are used to retrieve relevant contextual passages from the knowledge base (Step ➎), with the top three results by relevance score retained for quality (Step ➏). All components—triggered patterns, quantitative impact scores, and retrieved text—are integrated into a structured, multi-source evidence set (Step ➐). This set then guides an LLM via a CoT prompt in a controlled generation process to produce a faithful, natural-language report to the user (Step ➑).
This end-to-end process ensures every claim in the final Trustworthy Report is explicitly anchored in statistically verified evidence and domain expertise, fulfilling the promise of a reliable explanatory system.

4. Implementation

Our implementation is rigorously structured to operationalize the three pillars of the Trust Triangle: evidential reliability, external validity, and controlled generation. Each subsection details the methods that precisely align with and fulfill the requirements of its corresponding pillar.

4.1. Evidential Reliability: Building a Statistically Stable Detection Foundation

This phase establishes a trustworthy evidentiary base for detection. We begin with a public credit card fraud dataset [22], performing a sequential split to preserve temporal dynamics. The distributional equivalence between the resulting sets is verified using the Mann–Whitney U test [21], ensuring no data leakage biases our foundation. The core detector is a BAE [2], whose architecture and Monte Carlo Dropout inference [5] are designed to quantify predictive uncertainty intrinsically. This model is trained with the robust Huber loss [27] and the Adam optimizer. To quantify the stability of our primary performance metrics (PR-AUC, F1-score) [29] and all subsequent analyses, we employ bootstrap resampling (n = 200) [24]. Furthermore, the reliability of our feature attribution—a critical form of evidence—is quantified by measuring micro-consistency and macro-consensus ( C g l o b a l ) across multiple, independent attribution methods (Integrated Gradients [6], SHAP [7], and Perturbation [1]) using rank correlation [20]. This comprehensive approach ensures that the core detection evidence is robust, repeatable, and resistant to methodological variance, directly fulfilling the evidential reliability criterion.

4.2. External Validity: Establishing Causal Plausibility for Detected Features

This phase bridges the reliable detector’s outputs to real-world, generalizable phenomena. To ensure that features identified as important are not mere artifacts of the specific model or dataset, we subject them to rigorous statistical validation. For each attributed feature, we perform a Mann–Whitney U test [21] between the normal and fraudulent transaction populations in the data. We then apply False Discovery Rate (FDR) correction [8] to control for multiple comparisons, obtaining corrected p-values ( p j c o r r ) and effect sizes ( d j ). This process provides a statistically grounded, per-feature validity assessment. The Feature Importance Scores and confidence scores integrate this weighted statistical evidence, ensuring that the attributed importance aligns with statistically significant, real-world differences between the two classes. This step directly operationalizes the external validity pillar by tethering model inferences to empirically verifiable, population-level differences.

4.3. Controlled Generation: Orchestrating Auditable and Context-Aware Reporting

This phase translates detection evidence into actionable, trustworthy narratives through a constrained generative process. We implement a RAG pipeline [9] to ground the generation in external knowledge. For a new transaction instance, a risk score ( R i s k i = e i / T ) and feature impact scores ( I m p a c t j i ) are computed. These scores trigger predefined, auditable rule templates that map feature subsets to semantic fraud concepts. The triggered rules guide the RAG system [9,25] to retrieve the top three relevant contextual evidence from an external crime modus operandi database. This retrieved evidence is then structured into a sophisticated prompt for a LLM [18]. We implement a controlled, three-stage pipeline, qwen3-embedding:8b, for retrieval; the lightweight qwen2.5:1.5b-instruct [12] for intermediate description; and qwen2.5vl:7b [10] for final report generation, with temperature = 0.3 [11] for stability. This multi-stage, evidence-anchored pipeline ensures the generation of a coherent, audit-ready report that is directly controlled by and traceable to the initial detection evidence, thereby fulfilling the controlled generation pillar.

5. Results

5.1. Overall Predictive Performance and Robustness

Our BAE establishes a statistically robust and reliable foundation. It achieves a fraud-class F1-score of 0.75 and a PR-AUC of 0.7867 on the validation set. Bootstrap analysis (n = 200) confirms the stability of these estimates (e.g., PR-AUC 95% CI: [0.7812, 0.7925]). These performance metrics faithfully reflect the inherent classification ambiguity present in the data, which aligns with our goal of establishing a trustworthy foundation, as visualized by the overlapping reconstruction error distributions in Figure A1. The model demonstrates strong, stable discriminative power, with the average reconstruction error of fraudulent transactions being 5.52 times that of normal separation confirmed as statistically significant (Mann–Whitney U test, p < 0.001, Cliff’s δ = 0.612). The training process converged stably (final validation loss = 0.2960). This validates a reliable predictive backbone for subsequent interpretability analysis.

5.2. Evidential Reliability (Multi-Method Consensus)

Three attribution methods (IG, SHAP, Perturbation) achieve strong consensus (global consistency = 0.7024). The highest agreement is on ratio_to_median_purchase_price (0.7005). The lowest agreement (0.2605) is for distance_from_last_transaction, where its negative impact under perturbation contrasts with positive rankings from IG/SHAP, revealing complementary signals rather than contradiction (details as Table A1 and Figure A2 in Appendix A.1).

5.3. External Validity (Statistical Grounding)

A core contribution is feature-level validation. While overall model discriminative validity is significant (p = 0.003), only the feature ratio_to_median_purchase_price achieves statistical significance after FDR correction (p < 0.05, details in Appendix A.2). This demonstrates our framework’s ability to distinguish statistically grounded evidence from merely important signals.
To further assess the robustness of our statistical tests under the imbalanced setting, we performed a post hoc power analysis for each feature using the Mann–Whitney U test (normal approximation). As shown in Table A2, six out of seven features achieved a power of 1.000 at α = 0.05, indicating that the sample size (N ≈ 1 M, fraud rate 8.74%) provides exceptional sensitivity to detect true differences. Critically, the only feature that remained significant after FDR correction—ratio_to_median_purchase_price—exhibited the largest effect size (Cliff’s δ = −0.7009) and perfect power. Features with small effect sizes (|δ| < 0.2), such as used_chip and used_pin_number, failed to survive correction despite perfect power, underscoring our framework’s distinction between statistical significance and practical validity. The non-significant feature repeat_retailer showed low power (0.1183), consistent with its negligible role in the model. These results confirm that our conclusions are not compromised by insufficient power due to class imbalance, and they reinforce the necessity of combining effect size, significance testing, and power analysis for trustworthy feature selection.

5.4. Controlled Generation (Synthesis of Evidence)

The final outputs are synthesized through a controlled pipeline. Feature importance ( I m p o r t a n c e j ) and confidence scores ( C o n f i d e n c e j ), informed by reliability and validity weights, are integrated with predefined fraud rules. This structured evidence guides a RAG-enhanced LLM to generate audit-ready, narrative explanations, completing the bridge from quantitative evidence to actionable insights. (This is shown in Figure 5. The statistical testing results are provided in Table A3 of Appendix A.2, and the full workflow is detailed in Section 5.4 and demonstrated in Deployment, Section 6).

5.5. Stability of Key Evidence

Bootstrap analysis (n = 200) confirms the exceptional stability of the top-ranked features: ratio_to_median_purchase_price and distance_from_last_transaction, which show near-perfect rank stability (>0.98) and appear in the top two positions in 98.5% and 88.5% of resamples, respectively (see Table A4 in Appendix A.3). This underscores their reliable and central role in the model’s decision-making process.

6. Deployment

We concretely demonstrate the Bridging framework by analyzing an extreme-risk transaction (Instance_13; all 20 new instances are provided in Appendix B.1 Table A5), showcasing the operationalization of the Trust Triangle’s three pillars: evidential reliability, external validity, and controlled generation.

6.1. Case Analysis

We concretely demonstrate the Bridging framework by analyzing an extreme-risk transaction (Instance_13; all 20 new instances are provided in Appendix B.1 Table A5, and their risk score distribution is visualized in Figure A2), showcasing the operationalization of the Trust Triangle’s three pillars: evidential reliability, external validity, and controlled generation.
Instance_13 was assigned a risk score of 7.082 (see Appendix B.1 Table A6), indicating its reconstruction error was over seven times the statistically derived decision threshold. Figure A3 presents the distribution of risk scores for the 20 new instances, and Figure A4 visualizes the linear relationship between reconstruction error and risk score for all 20 new instances, with Instance 13 highlighted as an extreme outlier. Our framework identified ratio_to_median_purchase_price (value: 10.124, i.e., over ten times the median, see Appendix B.1 Table A5) as the decisive factor. Crucially, all three attribution methods converged, assigning it aligned, high importance scores (IG: 4.142; SHAP: 1.368; Perturbation: 2.231, see Appendix B.1 Table A6, and Feature importance of all features for the 20 new instances sees Appendix B.1 Table A7.). This instance-level multi-method consensus directly demonstrates evidential reliability.

6.2. Grounding the Evidence

The feature’s (ratio_to_median_purchase_price) reconstruction error deviated from the normal mean by 5.3 standard deviations (see Appendix B.2, Table A8), providing powerful instance-level corroboration of its global statistical significance established in Section 5.3. The calculated Feature Impact Score was 1.264 (Figure 6; see Appendix B.2, Table A8), accounting for 95.3% of the total impact for this instance. This overwhelming signal triggered predefined “High-Value Transaction” rules (see Appendix B.3, Figure A5), instantiating external validity. Furthermore, the RAG system retrieved seven distinct fraud patterns from an authoritative database (e.g., Counterfeit Card Fraud; see Appendix B.3, Figure A6), all explicitly linking “abnormally high transaction amounts” to criminal activity (relevance scores > 0.7, see Appendix B.3, Figure A6). This aligns the quantitative attribution with external, verifiable domain knowledge.

6.3. Synthesis and Audit Trail

The final step exemplifies controlled generation. The structured quantitative evidence (risk score, impact scores, triggered rules) and retrieved qualitative context were integrated into a prompt. An LLM, guided by chain-of-thought reasoning, synthesized this into a coherent, audit-ready report (see Appendix B.4). This completes a fully traceable loop from a black-box anomaly score to a semantically grounded, actionable narrative, validating the framework’s core promise of trustworthy explanation.

7. Conclusions, Limitations, and Future Work

7.1. Conclusions

We propose the Trust Triangle framework, a novel paradigm that systematically bridges evidential reliability, external validity, and controlled generation to transform black-box predictions into trustworthy, auditable explanations for high-stakes credit card fraud detection. Empirically, our framework delivers robust performance (PR-AUC = 0.7867) on highly imbalanced data (8.74% fraud rate) and uniquely identifies statistically grounded predictive drivers (ratio_to_median_purchase_price). Through case-based deployment, we demonstrate its ability to generate semantically coherent, evidence-anchored reports, completing a closed-loop verification from model output to decision support. We are excited to extend this paradigm to other high-stakes domains (e.g., healthcare, justice) and to investigate more efficient consensus mechanisms. To foster reproducible research, we will release our code and model components, advancing the practice of trustworthy AI from principle to practice.

7.2. Limitations

Despite its contributions, this study has several limitations that should be acknowledged:
  • Single Validated Feature: As reported in Section 5.3, only one feature (ratio_to_median_purchase_price) achieved statistical significance after FDR correction. While this finding underscores the rigor of our validity assessment—demonstrating that our framework successfully distinguishes statistically grounded signals from noise—it also raises the question of whether other features carry meaningful predictive signals that are masked by their small individual effect sizes or by correlations with other features.
  • Univariate Statistical Testing: Our external validity pillar relies on the Mann–Whitney U test, a univariate non-parametric method. This approach does not account for feature interactions or nonlinear relationships, which may be crucial for understanding complex fraud patterns.
  • Static Knowledge Base: The RAG pipeline currently depends on a fixed, pre-curated domain knowledge base. While we ensured high relevance scores (>0.7) for retrieved content, the knowledge base is not automatically updated as new fraud patterns emerge, limiting the system’s adaptability to concept drift.
  • LLM Dependence and Computational Cost: The controlled generation module employs a multi-stage LLM pipeline (embedding, intermediate description, final generation). Although we selected lightweight models (e.g., qwen2.5:1.5b-instruct) to mitigate cost [12], the approach still requires significant computational resources and relies on the inherent capabilities of the chosen LLMs, which may introduce biases or inconsistencies [16]. This trade-off between explainability and computational efficiency is a common challenge in deploying LLM-based systems for real-time applications.
  • Human Feedback Not Yet Implemented: While we propose a human-in-the-loop feedback mechanism, this component remains conceptual and has not been implemented or validated empirically. The effectiveness of expert feedback for model improvement and knowledge base updating requires future investigation.
  • Single Dataset and Domain: The framework has been demonstrated on a single credit card fraud dataset [22]. Its generalizability to other high-stakes domains (e.g., healthcare diagnostics, financial auditing, or cybersecurity) remains to be tested.

7.3. Future Work

Building on the limitations identified above, we outline several directions for future research:
  • Alternative Statistical Methods: While our framework successfully identifies one statistically validated feature, future work will explore alternative statistical methods—such as LASSO-regularized logistic regression, permutation importance with significance testing [14,26], Bayesian feature selection, and the Boruta algorithm—to uncover potential predictive signals in features that did not survive FDR correction. These methods offer complementary strengths: LASSO handles multicollinearity and feature selection jointly; permutation importance provides model-specific significance testing; Bayesian approaches incorporate prior knowledge; and the Boruta algorithm explicitly compares features to random probes. Such multivariate approaches may reveal interactions or nonlinear effects masked by univariate tests.
  • Feature Interaction Modeling: We plan to investigate methods that capture feature interactions, such as tree-based models with built-in interaction detection or neural attention mechanisms [15], to provide a more holistic understanding of fraud patterns.
  • Dynamic Knowledge Base Updating: Future iterations of the framework will incorporate mechanisms for semi-automated knowledge base updating, potentially using online learning or periodic retraining of the retrieval model [25] to adapt to emerging fraud modus operandi.
  • Human-in-the-Loop Feedback Mechanism: We envision a human-in-the-loop feedback mechanism to continuously improve system trustworthiness. Domain experts could review generated reports, flag errors, and update the knowledge base; their feedback would be logged and used to recalibrate attribution weights, refine rule templates, and periodically retrain the predictive model. This closed-loop learning from expert input would enable the Trust Triangle to adapt to evolving fraud patterns and reduce mistakes over time, moving toward truly adaptive and auditable AI in high-stakes domains.
  • Cross-Domain Validation: We aim to apply the Trust Triangle framework to other high-stakes domains, such as healthcare diagnostics (e.g., detecting anomalous patient records) and financial auditing (e.g., identifying irregular transactions), to assess its generalizability and adaptability. We plan to collaborate with domain experts in these fields to adapt the framework’s components—particularly the knowledge base and rule templates—to their specific contexts.
  • Efficiency Optimization: To address computational costs, we will explore more efficient consensus mechanisms, model distillation techniques, and lighter-weight LLM architectures [12] suitable for real-time deployment in production environments.
  • User Studies and Explainability Evaluation: Beyond quantitative validation, future work should include user studies with domain experts (e.g., fraud analysts) to evaluate the usefulness, interpretability, and actionability of the generated reports, providing qualitative evidence of the framework’s practical value.
By addressing these limitations and pursuing these future directions, we believe the Trust Triangle can evolve into a mature, deployable solution for trustworthy AI in high-risk applications, bridging the gap between rigorous statistical validation and human-interpretable explanations.

Author Contributions

Conceptualization, J.-C.S. and Y.-B.L.; methodology, J.-C.S. and N.-C.S.; software, J.-C.S.; validation, J.-C.S. and N.-C.S.; formal analysis, J.-C.S.; investigation, J.-C.S.; resources, Y.-B.L.; data curation, J.-C.S.; writing—original draft preparation, J.-C.S.; writing—review and editing, N.-C.S. and Y.-B.L.; visualization, J.-C.S.; supervision, Y.-B.L.; project administration, N.-C.S.; funding acquisition, J.-C.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding. The APC was funded by the authors.

Institutional Review Board Statement

Not applicable. The study utilized a publicly available, anonymized credit card fraud dataset and did not involve humans or animals.

Informed Consent Statement

Not applicable. The study did not involve humans.

Data Availability Statement

The credit card fraud dataset used in this study is publicly available on Kaggle and can be accessed via the reference [22] (https://www.kaggle.com/datasets/dhanushnarayananr/credit-card-fraud, accessed on 1 February 2026). The code and model components developed for the Trust Triangle framework are available from the corresponding author upon reasonable request to foster reproducible research.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Appendix A.1. Quantifying Reliability—Raw Multi-Method Attribution Consensus

Figure A1 illustrates the distribution of reconstruction errors (MSE) for normal versus fraudulent transaction samples in an anomaly detection framework. The optimal classification threshold (0.2974, indicated by the vertical dashed line) was determined by optimizing the PR-AUC metric. A key observation from this distribution is the significant overlap between the reconstruction errors of normal and fraudulent samples, leading to a substantial number of False Positive (FP) and False Negative (FN) instances. This indicates an inherent classification ambiguity that cannot be fully resolved by static models relying solely on feature representations learned from raw transactional data (e.g., spending behavior, spatial distance, security features). Many fraudulent transactions are highly similar in low-dimensional feature space to legitimate yet rare normal behaviors (e.g., a large purchase in a distant location, which could be a gift bought during a business trip). This observation explains the theoretical upper bound on any model’s performance on this dataset and clarifies why pursuing near-perfect static classification metrics may lead to overfitting to dataset-specific artifacts and result in unacceptably high false-positive rates in real-world deployment.
Figure A1. Distribution of Transaction Reconstruction Errors with an Optimal Threshold.
Figure A1. Distribution of Transaction Reconstruction Errors with an Optimal Threshold.
Ai 07 00114 g0a1
Our research directly acknowledges and leverages this challenge. The proposed dynamic modus operandi description framework (Steps 4 and 5 in Table 2) does not aim to forcibly eliminate this overlap—an often-unrealistic goal. Instead, it allows the system to incorporate iteratively updatable business rules and contextual knowledge (Step 6 in Table 2) when such ambiguous samples are detected, thereby enabling adaptation to concept drift (Steps 7 and 8 in Table 2). Consequently, the overlapping region in this figure underscores the core motivation and advantage of our approach: to construct a continuously evolving, robust, and practical fraud detection system that operates effectively within the acknowledged constraints of data ambiguity.
This appendix presents the raw scores and ranking comparisons (Table A1), along with visualizations (Figure A2), for the three feature attribution methods (IG, SHAP, Perturbation). This data is the source for calculating the micro-level method consistency ( m e t h o d _ c o n s i s t e n c y j ) and the macro-level consensus coefficient ( C g l o b a l ). It not only intuitively reveals the high degree of consensus across methods (global consistency = 0.7024) but also concretely presents complementary perspectives, such as for the feature distance_from_last_transaction across different methods. This serves as the core raw material for assessing evidential reliability.
Table A1. Comparison of Feature Importance Scores and rankings across three attribution methods.
Table A1. Comparison of Feature Importance Scores and rankings across three attribution methods.
FeatureIGSHAPPertConsistency
Among Methods
ScoreRankScoreRankScoreRank
distance_from_last_transaction0.67309310.7470151−0.67879970.2605
online_order0.65395320.03947150.10736220.5005
ratio_to_median_purchase_price0.52506230.4526243−0.41360860.7005
distance_from_home0.30794440.53101421.17710510.3805
repeat_retailer0.12116150.2018274−0.05828850.5005
used_chip0.00064160.0147326−0.00862340.5005
pin_number0.00000070.01430170.02242830.5005
Figure A2. Comparison of Feature Importance Scores and rankings across three attribution methods.
Figure A2. Comparison of Feature Importance Scores and rankings across three attribution methods.
Ai 07 00114 g0a2
Table A2. Power Analysis Results (α = 0.05, Two-Tailed Test).
Table A2. Power Analysis Results (α = 0.05, Two-Tailed Test).
FeatureU_Statisticp_ValueCliff_DeltaAUCPower
distance_from_home3.213396∗ 10 10  0.0000−0.19430.40291.0000
distance_from_last_transaction3.705598∗ 10 10  0.0000−0.07090.46461.0000
ratio_to_median_purchase_price1.193061∗ 10 10  0.0000−0.70090.14961.0000
repeat_retailer3.994380∗ 10 10  0.17460.00160.50080.1183
used_chip4.398982∗ 10 10  0.00000.10300.55151.0000
used_pin_number4.414208∗ 10 10  0.00000.10680.55341.0000
online_order2.695646∗ 10 10  0.0000−0.32410.33801.0000

Appendix A.2. Integrated Feature Importance Ranking with Reliability-Validity Verification

Table A3 presents the core output of our Bridging framework. The consensus-verified confidence scores embody evidential reliability, synthesizing multi-method agreement and stability. Statistical significance and effect size columns establish external validity, identifying ratio_to_median_purchase_price as the sole statistically grounded, high-confidence driver. This integrated and validated ranking provides the essential, structured input for controlled generation, enabling the RAG-enhanced LLM to produce an explanation firmly anchored in trustworthy, auditable evidence, thus completing the transformation from a black-box score to a white-box insight.
Table A3. Final Integrated Feature Importance with Statistical Validity.
Table A3. Final Integrated Feature Importance with Statistical Validity.
RankFeatureImportanceConfidenceStat. Sig. Effect SizeRank
1ratio_to_median_purchase_price0.31960.7260.5473Medium effect
2distance_from_last_transaction0.24930.3950.4992Small effect
3online_order0.16850.4650.9261Large effect
4distance_from_home0.15640.5240.2764Small effect
5repeat_retailer0.04630.6220.1863Very small effect
6used_chip0.03020.6370.7714Medium effect
7used_pin_number0.02970.6170.4602Small effect

Appendix A.3. Substantiating Stability—Full Bootstrap Analysis for Key Features

This appendix lists the complete feature importance stability assessment results based on 200 bootstrap resamples (Table A4). It provides all details for the key features mentioned in the main text (e.g., ratio_to_median_purchase_price), including the mean importance score, standard deviation, 95% confidence interval, rank stability, and Top3 frequency. This data powerfully substantiates the reliability of the evidence, demonstrating that the core findings are not accidental results of data fluctuation. Simultaneously, it enhances the generalizability of the validity conclusions through large-scale repeated measurement.
Table A4. Feature importance stability evaluation based on 200 bootstrap resampling.
Table A4. Feature importance stability evaluation based on 200 bootstrap resampling.
RankFeatureMean Importance
(±Standard Deviation)
95%
Confidence Interval
Ranking
Stability
Top-3
Frequency
Explanation
1price0.2770 ± 0.0942[0.1179, 0.4753]0.99298.5%Key Feature
2transaction0.2589 ± 0.1090[0.0716, 0.4859]0.98888.5%Key Feature
3home0.1746 ± 0.0861[0.0640, 0.4082]0.98771.0%Important Feature
4online0.1025 ± 0.0406[0.0330, 0.1887]0.99030.5%Important Feature
5repeat0.0751 ± 0.0371[0.0219, 0.1582]0.9888.5%Auxiliary Feature
6chip0.0610 ± 0.0332[0.0151, 0.1183]0.9892.5%Auxiliary Feature
7pin0.0509 ± 0.0302[0.0081, 0.1085]0.9910.5%Auxiliary Feature
Feature: price = ratio_to_median_purchase_price, transaction = distance_from_last_transaction, home = distance_from_home, online = online_order, repeat = repeat_retailer, chip = used_chip, pin = used_pin_number.

Appendix B

Appendix B.1. Twenty New Instances and Their Corresponding Analyses

Table A5. Raw data of 20 new instances.
Table A5. Raw data of 20 new instances.
Instance_IDHomeTransactionPriceRepeatChipPinOnline_OrderFraud
Instance_157.877856580.3111400081.9459399781100
Instance_210.82994270.1755915021.2942188111000
Instance_35.0910794910.8051525950.4277145631001
Instance_42.2475643285.6000435470.3626625781101
Instance_544.1909360.5664862682.2227672931101
Instance_65.58640767413.261073270.0647684651000
Instance_73.7240191250.9568379280.2784649331001
Instance_84.8482465720.3207354271.2730495341010
Instance_90.8766322562.5036089271.5169993330000
Instance_108.8390467042.9705122762.3616825331001
Instance_1114.263528740.1587580861.1361019431101
Instance_1213.592387570.2405398131.3703298631101
Instance_135.2825582610.37156196210.124473361001
Instance_1413.955872370.2715235282.7989011231001
Instance_15179.66518770.1209196340.5356404831111
Instance_16114.51878940.7070033530.5169899251000
Instance_173.5896885986.2474575431.8464505271000
Instance_1811.0858524834.661351432.5307584491001
Instance_192.13195566656.372400536.3586673341001
Instance_203.80305735167.241080531.8729496141001
Feature: price = ratio_to_median_purchase_price, transaction = distance_from_last_transaction, home = distance_from_home, online = online_order, repeat = repeat_retailer, chip = used_chip, pin = used_pin_number.
Table A6. Risk scores and Feature attribution scores of 20 new instances.
Table A6. Risk scores and Feature attribution scores of 20 new instances.
Instance_IDReconstruction ErrorRisk ScoreRisk_LevelIG_ValueSHAP_ValuePerturbation_Value
Instance_10.0965630.324644Normal0.3325490.5904150.382106
Instance_20.0173430.058308Normal0.3477244.04E−050.349023
Instance_30.232960.783207Normal0.2776490.4500880.34465
Instance_40.3129471.052123Low_Risk1.5383470.9899040.240123
Instance_50.0648740.218105Normal1.0849321.2146810.378436
Instance_60.1200970.403763Normal0.450780.0005680.45247
Instance_70.2627110.88323Normal0.3051980.435240.359591
Instance_80.0499490.167927Normal0.4623371.4352010.171437
Instance_91.2431944.179594Medium_Risk1.5721730.0001041.536995
Instance_100.0285790.09608Normal0.1971780.6364490.442014
Instance_110.1605180.539658Normal0.4492721.0844430.184187
Instance_120.1274720.428557Normal0.4204371.1143430.18204
Instance_132.1065897.08231Extreme_Risk4.1420731.3684752.2313
Instance_140.0192020.064558Normal0.1909810.6795160.48421
Instance_151.4633724.919827Medium_Risk7.3582752.5108961.714236
Instance_160.4637651.559168Low_Risk0.7940060.0002730.793759
Instance_170.0328020.11028Normal0.3630110.0002110.363865
Instance_180.2775330.933059Normal0.4524680.6687680.728295
Instance_191.3014914.375586Medium_Risk1.7594961.0365061.426202
Instance_201.1755463.952162Medium_Risk1.3096250.6177241.308714
Figure A3. Risk Score Distribution of new instances.
Figure A3. Risk Score Distribution of new instances.
Ai 07 00114 g0a3
Figure A4. Correlation between reconstruction error and risk score: 1.0000. Regression equation:Risk Score = 3.3620 × Reconstruction Error − 0.0000.
Figure A4. Correlation between reconstruction error and risk score: 1.0000. Regression equation:Risk Score = 3.3620 × Reconstruction Error − 0.0000.
Ai 07 00114 g0a4
Table A7. Feature importance of all features for the 20 new instances.
Table A7. Feature importance of all features for the 20 new instances.
Instance_IDW-HomeW-TransactionW-PriceW-RetailerW-ChipW-PinW-Online
Instance_10.2372671390.0101032830.0078771490.0001466180.0071125230.010051820.727441469
Instance_20.0283053660.0180294170.0066136060.00048340.0504779140.0113488160.884741482
Instance_30.0807964480.0178167310.7589093090.0050767260.0476859470.0104162480.07929859
Instance_40.001480740.0033807020.1965026970.0048083840.067578510.0027270310.723521936
Instance_50.0544088180.0126140530.3415533930.002758720.4601025830.0143454760.114216956
Instance_60.0341360390.0853710330.3075917420.0002933070.0305169040.0068602360.535230739
Instance_70.0783584730.0140468290.7851377770.0043648130.0409971230.0089552180.068139767
Instance_80.0349206210.0233742370.0404454220.0017205340.0464330540.0838579520.769248179
Instance_90.039788030.000627480.0003054630.5407325470.0223263250.0050193350.39120082
Instance_100.1102791530.0056287210.5636959020.00982420.0946325420.0206369350.195302547
Instance_110.022521880.0346501330.3917211330.0027483120.3939380290.0239138090.130506704
Instance_120.0385141270.0489620870.099363410.0040887470.5805657820.0353292990.193176548
Instance_130.0008200540.001512520.7457573540.0017365280.00039370.0003106230.249469221
Instance_140.0576534960.0485249270.5621645650.0101514960.0978114780.0213296670.202364371
Instance_150.5555786320.023519320.0020148320.0026737750.0032093050.3465859020.066418235
Instance_160.5479570380.0048438720.0878976260.0001837160.0191494220.0043050050.335663321
Instance_170.0654042460.0090725950.0373256340.0004527960.0473560190.0106475260.829741184
Instance_180.0241022040.7191801110.1650323080.0028074890.0270436320.0058973260.055936929
Instance_190.0095470650.462154410.4013699180.0014051060.0046623010.0011360570.119725143
Instance_200.0153375830.9255227830.0318288580.0008380150.0080684020.0017594790.016644881

Appendix B.2. Detailed Feature-Level Impact Analysis for Case Study

Table A8. Feature impact analysis for Instance 13. Integrated computation based on reconstruction error deviation and feature attributions.
Table A8. Feature impact analysis for Instance 13. Integrated computation based on reconstruction error deviation and feature attributions.
FeatureImpact_ScoreImportanceDeviation_ScoreDeviationInstance_ErrorN_MeanN_StdN_Importance
price1.2640970.7457571.6950515.30383514.493722.0419772.3476860.31959
online0.0558450.2494690.2238571.3282637.33E−070.650.489360.168534
repeat0.0003360.0017370.1936294.1841260.0144010.950.2236070.046277
transaction0.0001860.0015130.1229530.4932840.0336369.69318919.582120.249255
home7.11∗ 10 5  0.000820.0866880.5542250.20436325.3000345.280640.156414
chip7.59∗ 10 6  0.0003940.019290.6380751.38∗ 10 6 0.30.4701620.030231
pin3∗ 10 6  0.0003110.0096490.3248823.33∗ 10 6 0.10.3077940.0297
N_Importance: Attribution-based importance of each feature with respect to the original dataset (see Table A2). N_Mean: Mean reconstruction error of each feature for normal transactions in the original dataset. N_Std: Standard deviation of reconstruction error of each feature for normal transactions in the original dataset. Instance_Error: Reconstruction error of each feature for Instance 13. Deviation: Feature-wise deviation for Instance 13, defined as Deviation = abs((Instance_Error − N_Mean)/N_Std). Deviation_Score: Feature deviation score for Instance 13, computed as Deviation_Score = Deviation × N_Importance. Importance: Feature importance for Instance 13 (see Appendix B.1, Table A7). Impact_Score: Feature impact score for Instance 13, computed as Impact_Score = Importance × Deviation_Score.

Appendix B.3. Implementation Templates and Retrieved Knowledge for Case Study

Figure A5. An Example of RAG-Retrieved Domain Knowledge for Fraud Type 1 (“Counterfeit Card Fraud”).
Figure A5. An Example of RAG-Retrieved Domain Knowledge for Fraud Type 1 (“Counterfeit Card Fraud”).
Ai 07 00114 g0a5
These two resources fully demonstrate the closed-loop workflow from model attribution to auditable explanation. Figure A6 maps a validated key feature to a predefined fraud rule template, implementing the conversion from quantitative evidence to business semantics. Figure A6 displays the specific modus operandi descriptions retrieved by the RAG system from an authoritative knowledge base based on the triggered rule, providing a verifiable external knowledge anchor for the generation step. Together, they ensure the controllability and factuality of the final explanation.
Figure A6. An Example of Multi-Fraud Rule Mapping via Key Feature (ratio_to_median_purchase_price).
Figure A6. An Example of Multi-Fraud Rule Mapping via Key Feature (ratio_to_median_purchase_price).
Ai 07 00114 g0a6

Appendix B.4. Complete Generated Report for the Instance_13 Case Study

Trustworthy AI Deep-Dive Fraud Detection Analysis Report—Instance_13
A Comprehensive Evaluation Based on the Bridging Methodology and NeurIPS Trustworthy AI Standards
Report ID: AI-FD-20260103-001
Analysis Target: Instance_13 (Extreme-Risk Transaction)
Generation Time: 3 January 2026, 01:02:39
Analysis Model: qwen2.5:7b
Data Sources: Five Feature Analysis Files
Report Type: In-depth Insight Analysis
Table of Contents
  • Executive Summary
  • Research Background and Methodology
  • Feature Analysis of Instance_13
  • Feature Importance Evaluation
  • Crime Pattern Matching Analysis
  • Trustworthiness Assessment (Based on NeurIPS Standards)
  • Risk Management Recommendations
  • Conclusions and Outlook
  • Appendix
  • 1. Executive Summary
This report presents a comprehensive trustworthy AI analysis of Instance_13, an extreme-risk transaction. The transaction was identified as “extreme risk” with a risk score of 7.082, primarily driven by an abnormally high ratio of transaction amount to historical median purchase price (ratio_to_median_purchase_price = 10.124), which alone contributed 32.0% to the overall risk.
Instance_13 simultaneously triggered seven distinct crime patterns, all classified at the “strong alert” level.
The analysis adopts a Bridging Methodology, constructing a “Trust Triangle” through:
  • Multi-attribution consistency (reliability),
  • Statistical significance testing (validity), and
  • RAG-enhanced real-world crime knowledge (semantic grounding).
Beyond risk classification, this report delivers deep insights across four dimensions: feature anomalies, crime pattern matching, trustworthiness standards, and actionable risk management strategies.
  • 2. Research Background and Methodology
2.1. Research Background
With the widespread adoption of AI in financial risk management, the “black-box” problem has become a major regulatory concern. Traditional machine learning models often lack interpretability, making it difficult to meet compliance requirements. The Bridging Methodology proposed in this report addresses this challenge by transforming algorithmic predictions into verifiable and trustworthy business insights through a quantitative validation framework.
2.2. Analytical Methodology
  • Reliability Assessment: Integration of SHAP, perturbation-based methods, and Integrated Gradients to compute an attribution consistency score (0.702).
  • Validity Verification: Stability evaluation using Bootstrap confidence intervals and statistical significance testing (adjusted p-values).
  • Semantic Enhancement: Matching against real-world crime patterns using Retrieval-Augmented Generation (RAG).
  • Instance-Level Analysis: Multi-layer analysis of Instance_13 at the feature, pattern, and system levels.
  • 3. Feature Analysis of Instance_13
Detailed Feature Analysis and Risk Insights
1. Interpretation of Feature Values
  • ratio_to_median_purchase_price = 10.124
This value is significantly higher than the historical median purchase price, indicating that the transaction amount far exceeds normal levels. From a business perspective, this suggests a heightened risk of fraud or anomalous high-value transactions. Such deviations may stem from one-time large purchases, special promotions, or malicious credit card misuse.
  • online_order = 1 and used_chip = 0
This combination indicates an online transaction without chip-based verification. While online transactions are not inherently risky, the absence of chip authentication increases the likelihood of card misuse or fraud, warranting further investigation.
  • distance_from_home = 5.282 and distance_from_last_transaction = 0.372
These values indicate that the transaction occurred approximately 5 km from the cardholder’s residence but very close to the location of the previous transaction. This geographic pattern may suggest multiple transactions within a short time frame and limited spatial movement, potentially indicating repeated use at the same merchant or abnormal purchasing behavior.
  • repeat_retailer = 1 with other anomalous features
Although the retailer is repeatedly used, other features (e.g., distance and timing) exhibit abnormal behavior. This inconsistency may imply altered purchasing habits under special circumstances or potential fraudulent activity.
2. Feature Interactions and Combined Risk
  • High purchase price ratio
When combined with offline transactions (online_order = 0), an unusually high ratio_to_median_purchase_price may signal large cash-based purchases. In the absence of chip verification, this pattern further elevates fraud risk.
  • Geographical behavior analysis
Fluctuations between distance_from_home and distance_from_last_transaction may reflect unusual behavioral or environmental changes, indicating potential risk factors.
3. Anomaly Comparison
Compared with normal transaction patterns, Instance_13 deviates across multiple dimensions. Typically, users rely on chip authentication and exhibit stable geographic behavior. The combination of a high purchase ratio, lack of chip usage, and anomalous geographic patterns collectively signals elevated risk.
4. Business Implications
In operational settings, these feature combinations can serve as key indicators of potential fraud. For example:
  • A significantly elevated ratio_to_median_purchase_price should automatically trigger enhanced verification or manual review.
  • Online transactions without chip usage should prompt strengthened security checks.
5. High-Risk Signal Patterns
Identified high-risk patterns include:
  • Abnormally large transactions
  • Non-standard payment methods (online_order = 1 and used_chip = 0)
  • Geographical pattern anomalies
  • Repeated retailer usage with other abnormal features
Recognizing these patterns can substantially improve fraud detection accuracy and efficiency.
  • 4. Feature Importance Evaluation
Global Feature Importance Analysis
1. Business Implications of Global Feature Ranking
In model training, ratio_to_median_purchase_price was identified as the most influential feature, with an importance score of 0.3196 and strong statistical significance (p = 0.0010). This indicates high explanatory power for the target variable and suggests that spending deviations effectively capture abnormal financial behavior.
2. Relationship Between Statistical Significance and Stability
Although distance_from_last_transaction shows a relatively high importance score (0.2493), its statistical significance is weak (p = 0.2457) and stability is low, suggesting susceptibility to noise. In contrast, features such as repeat_retailer and chip usage exhibit both low importance and low stability and may be safely deprioritized.
3. Global vs. Instance-Level Importance
Global importance reflects aggregate contributions across all samples, potentially masking features that are critical in specific cases. Instance-level importance, by contrast, highlights features that are decisive for individual predictions, as demonstrated in Instance_13.
4. Interpretation of p-values and Effect Sizes
The online_order feature exhibits high statistical stability but moderate importance (0.1685), indicating consistent yet limited predictive contribution.
5. Risk Management Challenges of Low-Stability, High-Importance Features
distance_from_home exemplifies this issue, with moderate importance (0.1564) but low stability (CI width = 0.344), increasing predictive uncertainty and necessitating further validation.
6. Limitations of Feature Importance Analysis
Feature importance does not capture feature interactions and may undervalue features critical to real-world decision-making. Thus, it should be complemented with expert knowledge.
Business Recommendations
  • Further validate key low-stability features through additional data and improved preprocessing.
  • Do not fully disregard statistically weak features, as they may be crucial in specific scenarios.
  • Incorporate domain expertise to enhance interpretability and robustness.
  • 5. Crime Pattern Matching Analysis
Crime Pattern Matching
1. Reasons for Triggering Seven Crime Patterns
Instance_13 exhibits feature combinations that activate multiple crime patterns. For example, counterfeit card fraud relies on geographic distance and purchase ratio features, while phishing-based fraud emphasizes online transactions and repeated merchant usage.
2. Feature Weights and Priority Across Crime Patterns
  • Counterfeit Card Fraud (Physical Forged Cards): Driven primarily by high purchase ratios, followed by geographic distance features.
  • Intercepted New Card Fraud: Emphasizes purchase ratio and cardholder residence information.
  • Phishing-Based Fraud (Card-Not-Present): Focuses on online transactions and repeated merchant usage.
3. Alignment Between RAG-Extracted Crime Knowledge and Observed Features
RAG-extracted real-world crime techniques align closely with Instance_13’s feature patterns, strengthening the realism and credibility of the explanation.
4. Potential Organized Crime Operation Modes
  • Credit card forgery
  • Phishing attacks and malware-based data theft
  • Cross-channel fraud across online and offline transactions
5. Trends in Crime Pattern Evolution
Future crime trends may increasingly leverage advanced technologies (e.g., AI-generated phishing sites, cryptocurrency platforms), underscoring the need for enhanced online security, real-time monitoring, and cross-agency collaboration.
Investigation Recommendations
  • Strengthen identity verification, especially for high-risk transactions.
  • Improve user education regarding phishing and counterfeit card fraud.
  • Conduct regular reviews of anomalous transaction patterns.
  • 6. Trustworthiness Assessment (Based on NeurIPS Standards)
Trustworthiness Evaluation
1. Reliability (NeurIPS Standard)
The feature ratio_to_median_purchase_price demonstrates high importance, strong statistical significance, and moderate CI width (0.357), indicating robustness. Other features exhibit wider confidence intervals and lower robustness.
2. Validity (NeurIPS Standard)
While the primary feature shows strong predictive relevance, others (e.g., online_order, distance_from_home) have limited contribution and weak statistical significance.
3. Interpretability and Transparency
Although feature importance results are clear, detailed explanatory narratives should be expanded to enhance transparency and acceptance.
4. Fairness and Bias
Potential sample selection bias cannot be ruled out, particularly for statistically insignificant features, highlighting the need for representative training data.
5. Recommendations for Improving Trustworthiness
  • Increase data volume and diversity
  • Optimize feature engineering
  • Enhance explainability using advanced attribution techniques
  • Establish continuous model monitoring mechanisms
  • 7. Risk Management Recommendations
Comprehensive Risk Management Strategy
1. Immediate Risk Response
  • Immediately freeze high-risk transactions.
  • Promptly contact affected customers for identity verification.
2. Mid-Term Monitoring Adjustments
  • Dynamically adjust feature thresholds.
  • Introduce advanced machine learning models (e.g., Random Forest, XGBoost).
3. Long-Term Risk Governance
  • Implement multi-layer defense systems.
  • Develop real-time monitoring and alert platforms.
4. Technical System Optimization
  • Visualize feature importance.
  • Integrate explainability tools such as SHAP.
5. Personnel Training and Process Redesign
  • Conduct regular professional training.
  • Perform simulated incident response drills.
Expected Outcomes
  • Reduced fraud incidence
  • Improved customer satisfaction
  • Optimized resource allocation
  • 8. Conclusions and Outlook
Key Findings and Value
This study identifies ratio_to_median_purchase_price as a critical driver of fraud detection accuracy and validates its importance through crime pattern matching and real-world case alignment.
Limitations and Challenges
Data representativeness, feature dependency, and limited interpretability remain challenges, particularly for regulatory compliance.
Future Research Directions
Future work should incorporate richer data sources, develop highly interpretable AI models, and explore privacy-preserving techniques such as federated learning.
Industry and Regulatory Impact
This research provides valuable guidance for financial institutions and supports the development of standardized AI governance frameworks.
Outlook for Trustworthy AI in Financial Risk Management
Trustworthy AI will play a central role in enhancing transparency, accountability, and customer trust in financial risk management systems.
  • 9. Appendix
9.1 Data Sources
  • Global feature importance analysis
  • Instance_13 raw feature values
  • Feature impact scores
  • Triggered fraud rules
  • RAG-based real crime knowledge base
9.2 Method Overview
  • Integrated attribution-based feature importance
  • Bonferroni-corrected significance testing
  • Impact score formulation
  • Rule-based crime pattern triggering
9.3 Model Configuration
  • Model: qwen2.5:7b
  • Temperature: 0.3
  • Analysis Depth: Feature → Instance → Pattern → System
Report generation completed.

References

  1. Samek, W.; Montavon, G.; Vedaldi, A.; Hansen, L.K.; Müller, K.R. (Eds.) Explainable AI: Interpreting, Explaining and Visualizing Deep Learning; Lecture Notes in Artificial Intelligence 11700; Springer: Cham, Switzerland, 2019. [Google Scholar]
  2. Kingma, D.P.; Welling, M. An Introduction to Variational Autoencoders. Found. Trends Mach. Learn. 2019, 12, 307–392. [Google Scholar]
  3. Zong, B.; Song, Q.; Min, M.R.; Cheng, W.; Lumezanu, C.; Cho, D.; Chen, H. Deep Autoencoding Gaussian Mixture Model for Unsupervised Anomaly Detection. In Proceedings of the 6th International Conference on Learning Representations (ICLR), Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
  4. Adebayo, J.; Gilmer, J.; Muelly, M.; Goodfellow, I.; Hardt, M.; Kim, B. Sanity checks for saliency maps. In Proceedings of the 31st Conference on Neural Information Processing Systems (NeurIPS 2018), Montréal, QC, Canada, 3–8 December 2018; Volume 31. [Google Scholar] [CrossRef]
  5. Gal, Y.; Ghahramani, Z. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In Proceedings of the 33rd International Conference on Machine Learning (ICML 2016), New York, NY, USA, 19–24 June 2016; pp. 1050–1059. [Google Scholar] [CrossRef]
  6. Sundararajan, M.; Taly, A.; Yan, Q. Axiomatic attribution for deep networks. In Proceedings of the 34th International Conference on Machine Learning (ICML 2017), Sydney, Australia, 6–11 July 2017; pp. 3319–3328. [Google Scholar] [CrossRef]
  7. Lundberg, S.M.; Lee, S.I. A Unified Approach to Interpreting Model Predictions. Nat. Mach. Intell. 2020, 2, 56–67. [Google Scholar] [CrossRef]
  8. Wasserstein, R.L.; Schirm, A.L.; Lazar, N.A. Moving to a World Beyond “p < 0.05”. Am. Stat. 2019, 73, 1–19. [Google Scholar]
  9. Lewis, P.; Perez, E.; Piktus, A.; Petroni, F.; Karpukhin, V.; Goyal, N.; Küttler, H.; Lewis, M.; Yih, W.-t.; Rocktäschel, T.; et al. Retrieval-augmented generation for knowledge-intensive nlp tasks. In Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS 2020), Virtual, 6–12 December 2020; Volume 33, pp. 9459–9474. [Google Scholar] [CrossRef]
  10. Wei, J.; Wang, X.; Schuurmans, D.; Bosma, M.; Xia, F.; Chi, E.; Le, Q.V.; Zhou, D. Chain-of-thought prompting elicits reasoning in large language models. In Proceedings of the 35th Conference on Neural Information Processing Systems (NeurIPS 2022), New Orleans, LA, USA, 28 November–9 December 2022; Volume 35, pp. 24824–24837. [Google Scholar] [CrossRef]
  11. Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language models are few-shot learners. In Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS 2020), Virtual, 6–12 December 2020; Volume 33, pp. 1877–1901. [Google Scholar] [CrossRef]
  12. Chen, L.; Zaharia, M.; Zou, J. Frugalgpt: How to use large language models while reducing cost and improving performance. arXiv 2023, arXiv:2305.05176. [Google Scholar] [CrossRef]
  13. Chen, Z.; Bei, Y.; Rudin, C. Concept Whitening for Interpretable Image Recognition. Nat. Mach. Intell. 2020, 2, 772–782. [Google Scholar] [CrossRef]
  14. Molnar, C. Interpretable Machine Learning. 2020. Lulu.com. Available online: https://www.academia.edu/103808014/Interpretable_Machine_Learning (accessed on 4 March 2026).
  15. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the 30th Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar] [CrossRef]
  16. Mao, R.; Liu, Q.; He, K.; Li, W.; Cambria, E. The Biases of Pre-Trained Language Models: An Empirical Study on Prompt-Based Sentiment Analysis and Emotion Detection. IEEE Trans. Affect. Comput. 2023, 14, 1743–1753. [Google Scholar] [CrossRef]
  17. Lakkaraju, H.; Kamar, E.; Caruana, R.; Leskovec, J. Faithful and customizable explanations of black box models. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society (AIES ’19), Honolulu, HI, USA, 27–28 January 2019; pp. 131–138. [Google Scholar] [CrossRef]
  18. Touvron, H.; Lavril, T.; Izacard, G.; Martinet, X.; Lachaux, M.A.; Lacroix, T.; Rozière, B.; Goyal, N.; Hambro, E.; Azhar, F.; et al. Llama: Open and efficient foundation language models. arXiv 2023, arXiv:2302.13971. [Google Scholar] [CrossRef]
  19. Kumar, I.E.; Venkatasubramanian, S.; Scheidegger, C.; Friedler, S. Problems with Shapley-value-based explanations as feature importance measures. In Proceedings of the 37th International Conference on Machine Learning (ICML 2020), Virtual, 13–18 July 2020; pp. 5491–5500. [Google Scholar] [CrossRef]
  20. Wang, S.; Deng, Q.; Feng, S.; Zhang, H.; Liang, C. A Survey on Rank Aggregation. In Proceedings of the 33rd International Joint Conference on Artificial Intelligence (IJCAI 2024), Jeju, South Korea, 3–9 August 2024; pp. 8281–8289. [Google Scholar] [CrossRef]
  21. Conover, W.J. Practical Nonparametric Statistics, 4th ed.; John Wiley & Sons: New York, NY, USA, 2024. [Google Scholar]
  22. Kaggle. Dhanush Narayanan, R. Credit Card Fraud Dataset. 2021. Available online: https://www.kaggle.com/datasets/dhanushnarayananr/credit-card-fraud (accessed on 1 February 2026).
  23. Guyon, I.; Gunn, S.; Nikravesh, M.; Zadeh, L.A. (Eds.) Feature Extraction: Foundations and Applications, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 2020; Volume 207. [Google Scholar]
  24. Kohavi, R. A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI 1995), Montreal, QC, Canada, 20–25 August 1995; Volume 14, pp. 1137–1145. [Google Scholar]
  25. Karpukhin, V.; Oguz, B.; Min, S.; Lewis, P.; Wu, L.; Edunov, S.; Chen, D.; Yih, W.-t. Dense passage retrieval for open-domain question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020), Virtual, 16–20 November 2020; pp. 6769–6781. [Google Scholar] [CrossRef]
  26. Mentch, L.; Hooker, G. Quantifying Uncertainty in Random Forests via Confidence Intervals and Hypothesis Tests. J. Mach. Learn. Res. 2016, 17, 1–41. [Google Scholar]
  27. Liu, F.T.; Ting, K.M.; Zhou, Z.H. Isolation forest. In Proceedings of the 8th IEEE International Conference on Data Mining (ICDM 2008), Pisa, Italy, 15–19 December 2008; pp. 413–422. [Google Scholar] [CrossRef]
  28. Kaufman, S.; Rosset, S.; Perlich, C.; Stitelman, O. Leakage in Data Mining: Formulation, Detection, and Avoidance. ACM Trans. Knowl. Discov. Data (TKDD) 2012, 6, 15. [Google Scholar] [CrossRef]
  29. Davis, J.; Goadrich, M. The relationship between Precision-Recall and ROC curves. In Proceedings of the 23rd International Conference on Machine Learning (ICML 2006), Pittsburgh, PA, USA, 25–29 June 2006; pp. 233–240. [Google Scholar] [CrossRef]
  30. Demšar, J. Statistical Comparisons of Classifiers over Multiple Data Sets. J. Mach. Learn. Res. 2006, 7, 1–30. [Google Scholar]
Figure 1. The Trust Triangle framework: A three-module architecture for trustworthy and explainable fraud detection.
Figure 1. The Trust Triangle framework: A three-module architecture for trustworthy and explainable fraud detection.
Ai 07 00114 g001
Figure 2. Two-stage BAE pipeline for reliable fraud detection: building a statistically robust predictive foundation.
Figure 2. Two-stage BAE pipeline for reliable fraud detection: building a statistically robust predictive foundation.
Ai 07 00114 g002
Figure 3. Evidential reliability and external validity pipeline: transforming raw model outputs into verified quantitative evidence.
Figure 3. Evidential reliability and external validity pipeline: transforming raw model outputs into verified quantitative evidence.
Ai 07 00114 g003
Figure 4. Controlled generation pipeline: Transforming verified evidence into trustworthy, audit-ready natural language explanations.
Figure 4. Controlled generation pipeline: Transforming verified evidence into trustworthy, audit-ready natural language explanations.
Ai 07 00114 g004
Figure 5. Visualization of the final integrated feature importance ranking (color-coded by confidence level).
Figure 5. Visualization of the final integrated feature importance ranking (color-coded by confidence level).
Ai 07 00114 g005
Figure 6. Distribution of feature impact scores for Instance 13: identification of key features based on impact scores (Risk Level: Extreme Risk).
Figure 6. Distribution of feature impact scores for Instance 13: identification of key features based on impact scores (Risk Level: Extreme Risk).
Ai 07 00114 g006
Table 1. Comparison of existing approaches with the proposed Trust Triangle framework.
Table 1. Comparison of existing approaches with the proposed Trust Triangle framework.
AspectExisting ApproachesLimitationsTrust Triangle Advantage
Feature AttributionSingle-method approaches: LIME [1], Integrated Gradients [6], SHAP [7]Each method has distinct biases; results can be inconsistent and method-dependent [4,19]Multi-method consensus (Section 3.2.1) aggregates three theoretically distinct methods, quantifying agreement via micro-consistency and macro-consensus ( C g l o b a l ) [20]
Attribution ReliabilitySanity checks reveal that many saliency maps are insensitive to model randomization [4]; Shapley-value-based explanations may not reflect true feature importance [19]No quantitative standard for assessing attribution reliabilityThe evidential reliability pillar provides quantitative consistency metrics and adaptive fusion weights ( w r e l ) based on cross-method agreement
Statistical GroundingPost hoc explanations rarely validate attributions against ground-truth outcomesAttributions may highlight features that are statistically insignificant or lack real-world relevanceThe external validity pillar applies the Mann–Whitney U test with FDR correction [8,21] and effect size analysis, ensuring only statistically grounded features receive high importance
Uncertainty QuantificationStandard autoencoders [2,23] provide point estimates without confidence intervals; DAGMM [3] improves density estimation but lacks inference-time uncertaintyPredictions lack calibrated uncertainty, undermining trust in high-stakes decisionsBayesian Autoencoder with Monte Carlo Dropout [5] provides distributional estimates of reconstruction errors; bootstrap resampling [24] validates stability of thresholds and metrics
Explanation GenerationLLMs alone [11,18] risk hallucination [16] when generating explanations from unvalidated inputsFluency without faithfulness; explanations may be plausible but ungroundedControlled generation with RAG [9,25] and CoT prompting [10] constrains LLM reasoning to verified quantitative evidence and authoritative domain knowledge
End-to-End TrustworthinessExisting frameworks lack integrated verification of both reliability and validity before explanationTrust is assumed post hoc rather than built systematicallyTrust Triangle establishes a closed loop: reliability-validity verification → multi-source evidence integration → controlled generation → auditable report
Table 2. The Eight-Step Workflow of Controlled Generation: From Verification to Trustworthy Explanation.
Table 2. The Eight-Step Workflow of Controlled Generation: From Verification to Trustworthy Explanation.
StepCore Work (Essence)Alignment with Trust Triangle
Feature Importance Fusion
Aggregates scores from three theoretically distinct attribution methods (IG, SHAP, Perturbation) to generate a consensus-verified composite importance ranking.
Achieves Evidential Reliability. By establishing multi-method consensus, it mitigates the bias and instability inherent in any single post hoc explanation method [4,19]. This transforms the ML model’s internal reasoning into robust, reproducible quantitative evidence, forming a credible foundation for all subsequent steps [20].
New instance Risk Scoring
Computes a normalized risk score  R i e i / T  by comparing the instance’s reconstruction error to a statistically derived threshold from normal behavior.
Initiates the Instantiation of External Validity. It converts the model’s raw, absolute anomaly score into a statistically grounded, interpretable relative risk measure [8,24], enabling the transition from global model assessment to individualized risk evaluation.
New instance feature Impact Analysis
Calculates the personalized contribution  I m p a c t j i  of each feature by fusing global importance, instance-specific attribution, and feature-value anomaly.
Deeply Integrates Evidential Reliability and External Validity. This step dynamically combines consensus-verified importance (reliability) with instance-level statistical anomalies (validity), creating a traceable, quantitative anchor for individualized explanations [1,7].
Fraud Rule Template
Predefined, structured crime patterns, associated features, and semantic descriptions based on domain knowledge.
Establishes the Semantic Anchor for External Validity and Controlled Generation. Encoding expert knowledge into computable rules provides the necessary structure for aligning statistical evidence with actionable business logic, ensuring explanations possess inherent relevance [30].
Rule Triggering and Alerting
Dynamically matches predefined fraud rules based on aggregated feature impact scores of new instance and output tiered alerts (based on: LLM Interpretation Guideline, Impact trigger)
Realizes the Business Closure of External Validity. It systematically maps quantitative evidence to comprehensible business semantics (fraud rules), generating actionable, prioritized insights and ensuring the practical utility of the explanation [21].
RAG Knowledge Retrieval
Retrieves authoritative crime modus operandi details from an external knowledge base, strictly keyed by triggered Rule IDs.
Implements the Foundational Constraint for Controlled Generation. By tethering retrieval to quantifiably verified triggers, it restricts the LLM’s context to high-quality, relevant evidence, directly mitigating hallucinations [9,16].
Multi-Source Evidence Integration
Aggregates all quantitative evidence (from ➊➋➌) and qualitative knowledge (from ➍➎➏) into a unified, structured input schema.
Completes the Bridge for Controlled Generation. It constructs a structured interface that forces the subsequent LLM to reason upon an integrated, auditable evidence set, enabling end-to-end traceability [10,11].
Report Generation
Generates the final natural language report via an LLM driven by structured prompts and the integrated evidence.
Executes Controlled, Evidence-Based Semantic Articulation. Guided by chain-of-thought prompting [10] within the evidence-rich context, the LLM produces a coherent, audit-ready narrative that is a direct synthesis of the validated inputs, fulfilling the promise of a trustworthy explanatory system.
Summary. The bridging framework progresses through three stages to fulfill its core principles: 1. Foundation in Verified Evidence: Establishes reliable quantitative attributions (➊) via multi-method consensus, converting the model’s internal state into robust Feature Importance Scores. 2. Anchoring to Real-World Validity: Grounds this evidence in statistical and domain reality by calibrating instance risk (➋), personalizing feature impact (➌), mapping to Fraud Rule Template (➍) and business rules (➎), ensuring external validity and actionability. 3. Controlled Synthesis into Insights: Converts the anchored evidence into a semantic explanation through governed generation—retrieving authoritative context (➏), integrating all sources (➐), and producing an auditable report (➑). This completes the loop from a black-box prediction to a white-box explanation.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Shen, J.-C.; Su, N.-C.; Lin, Y.-B. Trust Triangle: A Reliability-Validity-Generation Framework for Explainable Credit Card Fraud Detection with RAG-Enhanced LLMs Reasoning. AI 2026, 7, 114. https://doi.org/10.3390/ai7030114

AMA Style

Shen J-C, Su N-C, Lin Y-B. Trust Triangle: A Reliability-Validity-Generation Framework for Explainable Credit Card Fraud Detection with RAG-Enhanced LLMs Reasoning. AI. 2026; 7(3):114. https://doi.org/10.3390/ai7030114

Chicago/Turabian Style

Shen, Jin-Ching, Nai-Ching Su, and Yi-Bing Lin. 2026. "Trust Triangle: A Reliability-Validity-Generation Framework for Explainable Credit Card Fraud Detection with RAG-Enhanced LLMs Reasoning" AI 7, no. 3: 114. https://doi.org/10.3390/ai7030114

APA Style

Shen, J.-C., Su, N.-C., & Lin, Y.-B. (2026). Trust Triangle: A Reliability-Validity-Generation Framework for Explainable Credit Card Fraud Detection with RAG-Enhanced LLMs Reasoning. AI, 7(3), 114. https://doi.org/10.3390/ai7030114

Article Metrics

Back to TopTop