Trust Triangle: A Reliability-Validity-Generation Framework for Explainable Credit Card Fraud Detection with RAG-Enhanced LLMs Reasoning
Abstract
1. Introduction
2. Related Works
2.1. Challenges in Trustworthy Anomaly Explanation
2.2. The Trust Triangle Framework
2.3. Dataset Characteristics
2.4. Comparison with Existing Approaches: The Need for a Bridging Framework
- Beyond Single-Method Attribution: While LIME [1], Integrated Gradients [6], and SHAP [7] each provide valuable perspectives, reliance on any single method risks specific bias [4,19]. Our framework transforms attribution from a point estimate into a consensus-verified distribution across methods, with explicit consistency metrics.
- From Statistical Significance to Practical Validity: Traditional hypothesis testing [21,23] identifies statistically significant differences, but without effect size [8] or multiple testing correction, features may be declared “important” despite negligible practical impact. Our external validity pillar integrates p-values, FDR correction [8], and effect sizes into a composite validity weight.
- Uncertainty-Aware Detection: Variational autoencoders [2] and their extensions [3,26] excel at learning normal patterns, but their point estimates ignore epistemic uncertainty. By incorporating Monte Carlo Dropout [5] and bootstrap validation [24], our BAE backbone provides calibrated uncertainty estimates essential for reliable threshold selection.
- Grounded Generation with RAG: While LLMs demonstrate remarkable fluency [11,18], their tendency to hallucinate [16] makes them unsuitable for direct explanation of high-stakes predictions. Our RAG pipeline [9,25] retrieves authoritative context, and CoT prompting [10] ensures that generated narratives remain faithful to the verified evidence.
- The Missing Bridge: Existing work either stops at explanation (post hoc methods) or generation (LLMs), but none systematically bridges quantitative verification with semantic articulation. The Trust Triangle fills this gap by introducing a dedicated Bridging Module that transforms raw model outputs into validated evidence before generation—a distinction that is both novel and essential for trustworthy AI in high-risk domains.
3. Method
3.1. A Robust Predictive Backbone for Imbalanced Fraud Detection
3.1.1. Two-Stage BAE Training
3.1.2. BAE Performance Evaluation
3.1.3. Workflow Description of Evidential Reliability
3.2. Bridging Evidence and Validity for Trustworthy Attributions
3.2.1. Evidential Reliability via Multi-Method Consensus
- Micro-Consistency (Feature-Level): We compute consistencyj , measuring the agreement across methods for that specific feature.
- Macro-Consensus (System-Level): We calculate global consistency = (Spearman’s + Kendall’s )/2, as the mean of pairwise Spearman’s and Kendall’s rank correlations between the three methods’ rankings [20,21]; dynamically determines the fusion weight for each method, favoring more stable methods (e.g., SHAP) when consensus is low.
3.2.2. External Validity via Statistical Association
3.2.3. Fusion and Output
3.2.4. Workflow Description of External Validit
3.3. Controlled Generation for Actionable Explanations
3.3.1. Risk Quantification and Rule Mapping
3.3.2. Controlled Generation with RAG
3.3.3. Workflow Description of Controlled, Grounded Generation
4. Implementation
4.1. Evidential Reliability: Building a Statistically Stable Detection Foundation
4.2. External Validity: Establishing Causal Plausibility for Detected Features
4.3. Controlled Generation: Orchestrating Auditable and Context-Aware Reporting
5. Results
5.1. Overall Predictive Performance and Robustness
5.2. Evidential Reliability (Multi-Method Consensus)
5.3. External Validity (Statistical Grounding)
5.4. Controlled Generation (Synthesis of Evidence)
5.5. Stability of Key Evidence
6. Deployment
6.1. Case Analysis
6.2. Grounding the Evidence
6.3. Synthesis and Audit Trail
7. Conclusions, Limitations, and Future Work
7.1. Conclusions
7.2. Limitations
- Single Validated Feature: As reported in Section 5.3, only one feature (ratio_to_median_purchase_price) achieved statistical significance after FDR correction. While this finding underscores the rigor of our validity assessment—demonstrating that our framework successfully distinguishes statistically grounded signals from noise—it also raises the question of whether other features carry meaningful predictive signals that are masked by their small individual effect sizes or by correlations with other features.
- Univariate Statistical Testing: Our external validity pillar relies on the Mann–Whitney U test, a univariate non-parametric method. This approach does not account for feature interactions or nonlinear relationships, which may be crucial for understanding complex fraud patterns.
- Static Knowledge Base: The RAG pipeline currently depends on a fixed, pre-curated domain knowledge base. While we ensured high relevance scores (>0.7) for retrieved content, the knowledge base is not automatically updated as new fraud patterns emerge, limiting the system’s adaptability to concept drift.
- LLM Dependence and Computational Cost: The controlled generation module employs a multi-stage LLM pipeline (embedding, intermediate description, final generation). Although we selected lightweight models (e.g., qwen2.5:1.5b-instruct) to mitigate cost [12], the approach still requires significant computational resources and relies on the inherent capabilities of the chosen LLMs, which may introduce biases or inconsistencies [16]. This trade-off between explainability and computational efficiency is a common challenge in deploying LLM-based systems for real-time applications.
- Human Feedback Not Yet Implemented: While we propose a human-in-the-loop feedback mechanism, this component remains conceptual and has not been implemented or validated empirically. The effectiveness of expert feedback for model improvement and knowledge base updating requires future investigation.
- Single Dataset and Domain: The framework has been demonstrated on a single credit card fraud dataset [22]. Its generalizability to other high-stakes domains (e.g., healthcare diagnostics, financial auditing, or cybersecurity) remains to be tested.
7.3. Future Work
- Alternative Statistical Methods: While our framework successfully identifies one statistically validated feature, future work will explore alternative statistical methods—such as LASSO-regularized logistic regression, permutation importance with significance testing [14,26], Bayesian feature selection, and the Boruta algorithm—to uncover potential predictive signals in features that did not survive FDR correction. These methods offer complementary strengths: LASSO handles multicollinearity and feature selection jointly; permutation importance provides model-specific significance testing; Bayesian approaches incorporate prior knowledge; and the Boruta algorithm explicitly compares features to random probes. Such multivariate approaches may reveal interactions or nonlinear effects masked by univariate tests.
- Feature Interaction Modeling: We plan to investigate methods that capture feature interactions, such as tree-based models with built-in interaction detection or neural attention mechanisms [15], to provide a more holistic understanding of fraud patterns.
- Dynamic Knowledge Base Updating: Future iterations of the framework will incorporate mechanisms for semi-automated knowledge base updating, potentially using online learning or periodic retraining of the retrieval model [25] to adapt to emerging fraud modus operandi.
- Human-in-the-Loop Feedback Mechanism: We envision a human-in-the-loop feedback mechanism to continuously improve system trustworthiness. Domain experts could review generated reports, flag errors, and update the knowledge base; their feedback would be logged and used to recalibrate attribution weights, refine rule templates, and periodically retrain the predictive model. This closed-loop learning from expert input would enable the Trust Triangle to adapt to evolving fraud patterns and reduce mistakes over time, moving toward truly adaptive and auditable AI in high-stakes domains.
- Cross-Domain Validation: We aim to apply the Trust Triangle framework to other high-stakes domains, such as healthcare diagnostics (e.g., detecting anomalous patient records) and financial auditing (e.g., identifying irregular transactions), to assess its generalizability and adaptability. We plan to collaborate with domain experts in these fields to adapt the framework’s components—particularly the knowledge base and rule templates—to their specific contexts.
- Efficiency Optimization: To address computational costs, we will explore more efficient consensus mechanisms, model distillation techniques, and lighter-weight LLM architectures [12] suitable for real-time deployment in production environments.
- User Studies and Explainability Evaluation: Beyond quantitative validation, future work should include user studies with domain experts (e.g., fraud analysts) to evaluate the usefulness, interpretability, and actionability of the generated reports, providing qualitative evidence of the framework’s practical value.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A
Appendix A.1. Quantifying Reliability—Raw Multi-Method Attribution Consensus

| Feature | IG | SHAP | Pert | Consistency Among Methods | |||
|---|---|---|---|---|---|---|---|
| Score | Rank | Score | Rank | Score | Rank | ||
| distance_from_last_transaction | 0.673093 | 1 | 0.747015 | 1 | −0.678799 | 7 | 0.2605 |
| online_order | 0.653953 | 2 | 0.039471 | 5 | 0.107362 | 2 | 0.5005 |
| ratio_to_median_purchase_price | 0.525062 | 3 | 0.452624 | 3 | −0.413608 | 6 | 0.7005 |
| distance_from_home | 0.307944 | 4 | 0.531014 | 2 | 1.177105 | 1 | 0.3805 |
| repeat_retailer | 0.121161 | 5 | 0.201827 | 4 | −0.058288 | 5 | 0.5005 |
| used_chip | 0.000641 | 6 | 0.014732 | 6 | −0.008623 | 4 | 0.5005 |
| pin_number | 0.000000 | 7 | 0.014301 | 7 | 0.022428 | 3 | 0.5005 |

| Feature | U_Statistic | p_Value | Cliff_Delta | AUC | Power |
|---|---|---|---|---|---|
| distance_from_home | 3.213396∗ | 0.0000 | −0.1943 | 0.4029 | 1.0000 |
| distance_from_last_transaction | 3.705598∗ | 0.0000 | −0.0709 | 0.4646 | 1.0000 |
| ratio_to_median_purchase_price | 1.193061∗ | 0.0000 | −0.7009 | 0.1496 | 1.0000 |
| repeat_retailer | 3.994380∗ | 0.1746 | 0.0016 | 0.5008 | 0.1183 |
| used_chip | 4.398982∗ | 0.0000 | 0.1030 | 0.5515 | 1.0000 |
| used_pin_number | 4.414208∗ | 0.0000 | 0.1068 | 0.5534 | 1.0000 |
| online_order | 2.695646∗ | 0.0000 | −0.3241 | 0.3380 | 1.0000 |
Appendix A.2. Integrated Feature Importance Ranking with Reliability-Validity Verification
| Rank | Feature | Importance | Confidence | Stat. Sig. | Effect Size | Rank |
|---|---|---|---|---|---|---|
| 1 | ratio_to_median_purchase_price | 0.3196 | 0.726 | ✓ | 0.5473 | Medium effect |
| 2 | distance_from_last_transaction | 0.2493 | 0.395 | ✗ | 0.4992 | Small effect |
| 3 | online_order | 0.1685 | 0.465 | ✗ | 0.9261 | Large effect |
| 4 | distance_from_home | 0.1564 | 0.524 | ✗ | 0.2764 | Small effect |
| 5 | repeat_retailer | 0.0463 | 0.622 | ✗ | 0.1863 | Very small effect |
| 6 | used_chip | 0.0302 | 0.637 | ✗ | 0.7714 | Medium effect |
| 7 | used_pin_number | 0.0297 | 0.617 | ✗ | 0.4602 | Small effect |
Appendix A.3. Substantiating Stability—Full Bootstrap Analysis for Key Features
| Rank | Feature | Mean Importance (±Standard Deviation) | 95% Confidence Interval | Ranking Stability | Top-3 Frequency | Explanation |
|---|---|---|---|---|---|---|
| 1 | price | 0.2770 ± 0.0942 | [0.1179, 0.4753] | 0.992 | 98.5% | Key Feature |
| 2 | transaction | 0.2589 ± 0.1090 | [0.0716, 0.4859] | 0.988 | 88.5% | Key Feature |
| 3 | home | 0.1746 ± 0.0861 | [0.0640, 0.4082] | 0.987 | 71.0% | Important Feature |
| 4 | online | 0.1025 ± 0.0406 | [0.0330, 0.1887] | 0.990 | 30.5% | Important Feature |
| 5 | repeat | 0.0751 ± 0.0371 | [0.0219, 0.1582] | 0.988 | 8.5% | Auxiliary Feature |
| 6 | chip | 0.0610 ± 0.0332 | [0.0151, 0.1183] | 0.989 | 2.5% | Auxiliary Feature |
| 7 | pin | 0.0509 ± 0.0302 | [0.0081, 0.1085] | 0.991 | 0.5% | Auxiliary Feature |
Appendix B
Appendix B.1. Twenty New Instances and Their Corresponding Analyses
| Instance_ID | Home | Transaction | Price | Repeat | Chip | Pin | Online_Order | Fraud |
|---|---|---|---|---|---|---|---|---|
| Instance_1 | 57.87785658 | 0.311140008 | 1.945939978 | 1 | 1 | 0 | 0 | |
| Instance_2 | 10.8299427 | 0.175591502 | 1.294218811 | 1 | 0 | 0 | 0 | |
| Instance_3 | 5.091079491 | 0.805152595 | 0.427714563 | 1 | 0 | 0 | 1 | |
| Instance_4 | 2.247564328 | 5.600043547 | 0.362662578 | 1 | 1 | 0 | 1 | |
| Instance_5 | 44.190936 | 0.566486268 | 2.222767293 | 1 | 1 | 0 | 1 | |
| Instance_6 | 5.586407674 | 13.26107327 | 0.064768465 | 1 | 0 | 0 | 0 | |
| Instance_7 | 3.724019125 | 0.956837928 | 0.278464933 | 1 | 0 | 0 | 1 | |
| Instance_8 | 4.848246572 | 0.320735427 | 1.273049534 | 1 | 0 | 1 | 0 | |
| Instance_9 | 0.876632256 | 2.503608927 | 1.516999333 | 0 | 0 | 0 | 0 | |
| Instance_10 | 8.839046704 | 2.970512276 | 2.361682533 | 1 | 0 | 0 | 1 | |
| Instance_11 | 14.26352874 | 0.158758086 | 1.136101943 | 1 | 1 | 0 | 1 | |
| Instance_12 | 13.59238757 | 0.240539813 | 1.370329863 | 1 | 1 | 0 | 1 | |
| Instance_13 | 5.282558261 | 0.371561962 | 10.12447336 | 1 | 0 | 0 | 1 | |
| Instance_14 | 13.95587237 | 0.271523528 | 2.798901123 | 1 | 0 | 0 | 1 | |
| Instance_15 | 179.6651877 | 0.120919634 | 0.535640483 | 1 | 1 | 1 | 1 | |
| Instance_16 | 114.5187894 | 0.707003353 | 0.516989925 | 1 | 0 | 0 | 0 | |
| Instance_17 | 3.589688598 | 6.247457543 | 1.846450527 | 1 | 0 | 0 | 0 | |
| Instance_18 | 11.08585248 | 34.66135143 | 2.530758449 | 1 | 0 | 0 | 1 | |
| Instance_19 | 2.131955666 | 56.37240053 | 6.358667334 | 1 | 0 | 0 | 1 | |
| Instance_20 | 3.803057351 | 67.24108053 | 1.872949614 | 1 | 0 | 0 | 1 |
| Instance_ID | Reconstruction Error | Risk Score | Risk_Level | IG_Value | SHAP_Value | Perturbation_Value |
|---|---|---|---|---|---|---|
| Instance_1 | 0.096563 | 0.324644 | Normal | 0.332549 | 0.590415 | 0.382106 |
| Instance_2 | 0.017343 | 0.058308 | Normal | 0.347724 | 4.04E−05 | 0.349023 |
| Instance_3 | 0.23296 | 0.783207 | Normal | 0.277649 | 0.450088 | 0.34465 |
| Instance_4 | 0.312947 | 1.052123 | Low_Risk | 1.538347 | 0.989904 | 0.240123 |
| Instance_5 | 0.064874 | 0.218105 | Normal | 1.084932 | 1.214681 | 0.378436 |
| Instance_6 | 0.120097 | 0.403763 | Normal | 0.45078 | 0.000568 | 0.45247 |
| Instance_7 | 0.262711 | 0.88323 | Normal | 0.305198 | 0.43524 | 0.359591 |
| Instance_8 | 0.049949 | 0.167927 | Normal | 0.462337 | 1.435201 | 0.171437 |
| Instance_9 | 1.243194 | 4.179594 | Medium_Risk | 1.572173 | 0.000104 | 1.536995 |
| Instance_10 | 0.028579 | 0.09608 | Normal | 0.197178 | 0.636449 | 0.442014 |
| Instance_11 | 0.160518 | 0.539658 | Normal | 0.449272 | 1.084443 | 0.184187 |
| Instance_12 | 0.127472 | 0.428557 | Normal | 0.420437 | 1.114343 | 0.18204 |
| Instance_13 | 2.106589 | 7.08231 | Extreme_Risk | 4.142073 | 1.368475 | 2.2313 |
| Instance_14 | 0.019202 | 0.064558 | Normal | 0.190981 | 0.679516 | 0.48421 |
| Instance_15 | 1.463372 | 4.919827 | Medium_Risk | 7.358275 | 2.510896 | 1.714236 |
| Instance_16 | 0.463765 | 1.559168 | Low_Risk | 0.794006 | 0.000273 | 0.793759 |
| Instance_17 | 0.032802 | 0.11028 | Normal | 0.363011 | 0.000211 | 0.363865 |
| Instance_18 | 0.277533 | 0.933059 | Normal | 0.452468 | 0.668768 | 0.728295 |
| Instance_19 | 1.301491 | 4.375586 | Medium_Risk | 1.759496 | 1.036506 | 1.426202 |
| Instance_20 | 1.175546 | 3.952162 | Medium_Risk | 1.309625 | 0.617724 | 1.308714 |


| Instance_ID | W-Home | W-Transaction | W-Price | W-Retailer | W-Chip | W-Pin | W-Online |
|---|---|---|---|---|---|---|---|
| Instance_1 | 0.237267139 | 0.010103283 | 0.007877149 | 0.000146618 | 0.007112523 | 0.01005182 | 0.727441469 |
| Instance_2 | 0.028305366 | 0.018029417 | 0.006613606 | 0.0004834 | 0.050477914 | 0.011348816 | 0.884741482 |
| Instance_3 | 0.080796448 | 0.017816731 | 0.758909309 | 0.005076726 | 0.047685947 | 0.010416248 | 0.07929859 |
| Instance_4 | 0.00148074 | 0.003380702 | 0.196502697 | 0.004808384 | 0.06757851 | 0.002727031 | 0.723521936 |
| Instance_5 | 0.054408818 | 0.012614053 | 0.341553393 | 0.00275872 | 0.460102583 | 0.014345476 | 0.114216956 |
| Instance_6 | 0.034136039 | 0.085371033 | 0.307591742 | 0.000293307 | 0.030516904 | 0.006860236 | 0.535230739 |
| Instance_7 | 0.078358473 | 0.014046829 | 0.785137777 | 0.004364813 | 0.040997123 | 0.008955218 | 0.068139767 |
| Instance_8 | 0.034920621 | 0.023374237 | 0.040445422 | 0.001720534 | 0.046433054 | 0.083857952 | 0.769248179 |
| Instance_9 | 0.03978803 | 0.00062748 | 0.000305463 | 0.540732547 | 0.022326325 | 0.005019335 | 0.39120082 |
| Instance_10 | 0.110279153 | 0.005628721 | 0.563695902 | 0.0098242 | 0.094632542 | 0.020636935 | 0.195302547 |
| Instance_11 | 0.02252188 | 0.034650133 | 0.391721133 | 0.002748312 | 0.393938029 | 0.023913809 | 0.130506704 |
| Instance_12 | 0.038514127 | 0.048962087 | 0.09936341 | 0.004088747 | 0.580565782 | 0.035329299 | 0.193176548 |
| Instance_13 | 0.000820054 | 0.00151252 | 0.745757354 | 0.001736528 | 0.0003937 | 0.000310623 | 0.249469221 |
| Instance_14 | 0.057653496 | 0.048524927 | 0.562164565 | 0.010151496 | 0.097811478 | 0.021329667 | 0.202364371 |
| Instance_15 | 0.555578632 | 0.02351932 | 0.002014832 | 0.002673775 | 0.003209305 | 0.346585902 | 0.066418235 |
| Instance_16 | 0.547957038 | 0.004843872 | 0.087897626 | 0.000183716 | 0.019149422 | 0.004305005 | 0.335663321 |
| Instance_17 | 0.065404246 | 0.009072595 | 0.037325634 | 0.000452796 | 0.047356019 | 0.010647526 | 0.829741184 |
| Instance_18 | 0.024102204 | 0.719180111 | 0.165032308 | 0.002807489 | 0.027043632 | 0.005897326 | 0.055936929 |
| Instance_19 | 0.009547065 | 0.46215441 | 0.401369918 | 0.001405106 | 0.004662301 | 0.001136057 | 0.119725143 |
| Instance_20 | 0.015337583 | 0.925522783 | 0.031828858 | 0.000838015 | 0.008068402 | 0.001759479 | 0.016644881 |
Appendix B.2. Detailed Feature-Level Impact Analysis for Case Study
| Feature | Impact_Score | Importance | Deviation_Score | Deviation | Instance_Error | N_Mean | N_Std | N_Importance |
|---|---|---|---|---|---|---|---|---|
| price | 1.264097 | 0.745757 | 1.695051 | 5.303835 | 14.49372 | 2.041977 | 2.347686 | 0.31959 |
| online | 0.055845 | 0.249469 | 0.223857 | 1.328263 | 7.33E−07 | 0.65 | 0.48936 | 0.168534 |
| repeat | 0.000336 | 0.001737 | 0.193629 | 4.184126 | 0.014401 | 0.95 | 0.223607 | 0.046277 |
| transaction | 0.000186 | 0.001513 | 0.122953 | 0.493284 | 0.033636 | 9.693189 | 19.58212 | 0.249255 |
| home | 7.11∗ | 0.00082 | 0.086688 | 0.554225 | 0.204363 | 25.30003 | 45.28064 | 0.156414 |
| chip | 7.59∗ | 0.000394 | 0.01929 | 0.638075 | 1.38∗ | 0.3 | 0.470162 | 0.030231 |
| pin | 3∗ | 0.000311 | 0.009649 | 0.324882 | 3.33∗ | 0.1 | 0.307794 | 0.0297 |
Appendix B.3. Implementation Templates and Retrieved Knowledge for Case Study


Appendix B.4. Complete Generated Report for the Instance_13 Case Study
- Executive Summary
- Research Background and Methodology
- Feature Analysis of Instance_13
- Feature Importance Evaluation
- Crime Pattern Matching Analysis
- Trustworthiness Assessment (Based on NeurIPS Standards)
- Risk Management Recommendations
- Conclusions and Outlook
- Appendix
- 1. Executive Summary
- Multi-attribution consistency (reliability),
- Statistical significance testing (validity), and
- RAG-enhanced real-world crime knowledge (semantic grounding).
- 2. Research Background and Methodology
- Reliability Assessment: Integration of SHAP, perturbation-based methods, and Integrated Gradients to compute an attribution consistency score (0.702).
- Validity Verification: Stability evaluation using Bootstrap confidence intervals and statistical significance testing (adjusted p-values).
- Semantic Enhancement: Matching against real-world crime patterns using Retrieval-Augmented Generation (RAG).
- Instance-Level Analysis: Multi-layer analysis of Instance_13 at the feature, pattern, and system levels.
- 3. Feature Analysis of Instance_13
- ratio_to_median_purchase_price = 10.124
- online_order = 1 and used_chip = 0
- distance_from_home = 5.282 and distance_from_last_transaction = 0.372
- repeat_retailer = 1 with other anomalous features
- High purchase price ratio
- Geographical behavior analysis
- A significantly elevated ratio_to_median_purchase_price should automatically trigger enhanced verification or manual review.
- Online transactions without chip usage should prompt strengthened security checks.
- Abnormally large transactions
- Non-standard payment methods (online_order = 1 and used_chip = 0)
- Geographical pattern anomalies
- Repeated retailer usage with other abnormal features
- 4. Feature Importance Evaluation
- Further validate key low-stability features through additional data and improved preprocessing.
- Do not fully disregard statistically weak features, as they may be crucial in specific scenarios.
- Incorporate domain expertise to enhance interpretability and robustness.
- 5. Crime Pattern Matching Analysis
- Counterfeit Card Fraud (Physical Forged Cards): Driven primarily by high purchase ratios, followed by geographic distance features.
- Intercepted New Card Fraud: Emphasizes purchase ratio and cardholder residence information.
- Phishing-Based Fraud (Card-Not-Present): Focuses on online transactions and repeated merchant usage.
- Credit card forgery
- Phishing attacks and malware-based data theft
- Cross-channel fraud across online and offline transactions
- Strengthen identity verification, especially for high-risk transactions.
- Improve user education regarding phishing and counterfeit card fraud.
- Conduct regular reviews of anomalous transaction patterns.
- 6. Trustworthiness Assessment (Based on NeurIPS Standards)
- Increase data volume and diversity
- Optimize feature engineering
- Enhance explainability using advanced attribution techniques
- Establish continuous model monitoring mechanisms
- 7. Risk Management Recommendations
- Immediately freeze high-risk transactions.
- Promptly contact affected customers for identity verification.
- Dynamically adjust feature thresholds.
- Introduce advanced machine learning models (e.g., Random Forest, XGBoost).
- Implement multi-layer defense systems.
- Develop real-time monitoring and alert platforms.
- Visualize feature importance.
- Integrate explainability tools such as SHAP.
- Conduct regular professional training.
- Perform simulated incident response drills.
- Reduced fraud incidence
- Improved customer satisfaction
- Optimized resource allocation
- 8. Conclusions and Outlook
- 9. Appendix
- Global feature importance analysis
- Instance_13 raw feature values
- Feature impact scores
- Triggered fraud rules
- RAG-based real crime knowledge base
- Integrated attribution-based feature importance
- Bonferroni-corrected significance testing
- Impact score formulation
- Rule-based crime pattern triggering
- Model: qwen2.5:7b
- Temperature: 0.3
- Analysis Depth: Feature → Instance → Pattern → System
References
- Samek, W.; Montavon, G.; Vedaldi, A.; Hansen, L.K.; Müller, K.R. (Eds.) Explainable AI: Interpreting, Explaining and Visualizing Deep Learning; Lecture Notes in Artificial Intelligence 11700; Springer: Cham, Switzerland, 2019. [Google Scholar]
- Kingma, D.P.; Welling, M. An Introduction to Variational Autoencoders. Found. Trends Mach. Learn. 2019, 12, 307–392. [Google Scholar]
- Zong, B.; Song, Q.; Min, M.R.; Cheng, W.; Lumezanu, C.; Cho, D.; Chen, H. Deep Autoencoding Gaussian Mixture Model for Unsupervised Anomaly Detection. In Proceedings of the 6th International Conference on Learning Representations (ICLR), Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Adebayo, J.; Gilmer, J.; Muelly, M.; Goodfellow, I.; Hardt, M.; Kim, B. Sanity checks for saliency maps. In Proceedings of the 31st Conference on Neural Information Processing Systems (NeurIPS 2018), Montréal, QC, Canada, 3–8 December 2018; Volume 31. [Google Scholar] [CrossRef]
- Gal, Y.; Ghahramani, Z. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In Proceedings of the 33rd International Conference on Machine Learning (ICML 2016), New York, NY, USA, 19–24 June 2016; pp. 1050–1059. [Google Scholar] [CrossRef]
- Sundararajan, M.; Taly, A.; Yan, Q. Axiomatic attribution for deep networks. In Proceedings of the 34th International Conference on Machine Learning (ICML 2017), Sydney, Australia, 6–11 July 2017; pp. 3319–3328. [Google Scholar] [CrossRef]
- Lundberg, S.M.; Lee, S.I. A Unified Approach to Interpreting Model Predictions. Nat. Mach. Intell. 2020, 2, 56–67. [Google Scholar] [CrossRef]
- Wasserstein, R.L.; Schirm, A.L.; Lazar, N.A. Moving to a World Beyond “p < 0.05”. Am. Stat. 2019, 73, 1–19. [Google Scholar]
- Lewis, P.; Perez, E.; Piktus, A.; Petroni, F.; Karpukhin, V.; Goyal, N.; Küttler, H.; Lewis, M.; Yih, W.-t.; Rocktäschel, T.; et al. Retrieval-augmented generation for knowledge-intensive nlp tasks. In Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS 2020), Virtual, 6–12 December 2020; Volume 33, pp. 9459–9474. [Google Scholar] [CrossRef]
- Wei, J.; Wang, X.; Schuurmans, D.; Bosma, M.; Xia, F.; Chi, E.; Le, Q.V.; Zhou, D. Chain-of-thought prompting elicits reasoning in large language models. In Proceedings of the 35th Conference on Neural Information Processing Systems (NeurIPS 2022), New Orleans, LA, USA, 28 November–9 December 2022; Volume 35, pp. 24824–24837. [Google Scholar] [CrossRef]
- Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language models are few-shot learners. In Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS 2020), Virtual, 6–12 December 2020; Volume 33, pp. 1877–1901. [Google Scholar] [CrossRef]
- Chen, L.; Zaharia, M.; Zou, J. Frugalgpt: How to use large language models while reducing cost and improving performance. arXiv 2023, arXiv:2305.05176. [Google Scholar] [CrossRef]
- Chen, Z.; Bei, Y.; Rudin, C. Concept Whitening for Interpretable Image Recognition. Nat. Mach. Intell. 2020, 2, 772–782. [Google Scholar] [CrossRef]
- Molnar, C. Interpretable Machine Learning. 2020. Lulu.com. Available online: https://www.academia.edu/103808014/Interpretable_Machine_Learning (accessed on 4 March 2026).
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the 30th Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar] [CrossRef]
- Mao, R.; Liu, Q.; He, K.; Li, W.; Cambria, E. The Biases of Pre-Trained Language Models: An Empirical Study on Prompt-Based Sentiment Analysis and Emotion Detection. IEEE Trans. Affect. Comput. 2023, 14, 1743–1753. [Google Scholar] [CrossRef]
- Lakkaraju, H.; Kamar, E.; Caruana, R.; Leskovec, J. Faithful and customizable explanations of black box models. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society (AIES ’19), Honolulu, HI, USA, 27–28 January 2019; pp. 131–138. [Google Scholar] [CrossRef]
- Touvron, H.; Lavril, T.; Izacard, G.; Martinet, X.; Lachaux, M.A.; Lacroix, T.; Rozière, B.; Goyal, N.; Hambro, E.; Azhar, F.; et al. Llama: Open and efficient foundation language models. arXiv 2023, arXiv:2302.13971. [Google Scholar] [CrossRef]
- Kumar, I.E.; Venkatasubramanian, S.; Scheidegger, C.; Friedler, S. Problems with Shapley-value-based explanations as feature importance measures. In Proceedings of the 37th International Conference on Machine Learning (ICML 2020), Virtual, 13–18 July 2020; pp. 5491–5500. [Google Scholar] [CrossRef]
- Wang, S.; Deng, Q.; Feng, S.; Zhang, H.; Liang, C. A Survey on Rank Aggregation. In Proceedings of the 33rd International Joint Conference on Artificial Intelligence (IJCAI 2024), Jeju, South Korea, 3–9 August 2024; pp. 8281–8289. [Google Scholar] [CrossRef]
- Conover, W.J. Practical Nonparametric Statistics, 4th ed.; John Wiley & Sons: New York, NY, USA, 2024. [Google Scholar]
- Kaggle. Dhanush Narayanan, R. Credit Card Fraud Dataset. 2021. Available online: https://www.kaggle.com/datasets/dhanushnarayananr/credit-card-fraud (accessed on 1 February 2026).
- Guyon, I.; Gunn, S.; Nikravesh, M.; Zadeh, L.A. (Eds.) Feature Extraction: Foundations and Applications, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 2020; Volume 207. [Google Scholar]
- Kohavi, R. A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI 1995), Montreal, QC, Canada, 20–25 August 1995; Volume 14, pp. 1137–1145. [Google Scholar]
- Karpukhin, V.; Oguz, B.; Min, S.; Lewis, P.; Wu, L.; Edunov, S.; Chen, D.; Yih, W.-t. Dense passage retrieval for open-domain question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020), Virtual, 16–20 November 2020; pp. 6769–6781. [Google Scholar] [CrossRef]
- Mentch, L.; Hooker, G. Quantifying Uncertainty in Random Forests via Confidence Intervals and Hypothesis Tests. J. Mach. Learn. Res. 2016, 17, 1–41. [Google Scholar]
- Liu, F.T.; Ting, K.M.; Zhou, Z.H. Isolation forest. In Proceedings of the 8th IEEE International Conference on Data Mining (ICDM 2008), Pisa, Italy, 15–19 December 2008; pp. 413–422. [Google Scholar] [CrossRef]
- Kaufman, S.; Rosset, S.; Perlich, C.; Stitelman, O. Leakage in Data Mining: Formulation, Detection, and Avoidance. ACM Trans. Knowl. Discov. Data (TKDD) 2012, 6, 15. [Google Scholar] [CrossRef]
- Davis, J.; Goadrich, M. The relationship between Precision-Recall and ROC curves. In Proceedings of the 23rd International Conference on Machine Learning (ICML 2006), Pittsburgh, PA, USA, 25–29 June 2006; pp. 233–240. [Google Scholar] [CrossRef]
- Demšar, J. Statistical Comparisons of Classifiers over Multiple Data Sets. J. Mach. Learn. Res. 2006, 7, 1–30. [Google Scholar]






| Aspect | Existing Approaches | Limitations | Trust Triangle Advantage |
|---|---|---|---|
| Feature Attribution | Single-method approaches: LIME [1], Integrated Gradients [6], SHAP [7] | Each method has distinct biases; results can be inconsistent and method-dependent [4,19] | Multi-method consensus (Section 3.2.1) aggregates three theoretically distinct methods, quantifying agreement via micro-consistency and macro-consensus () [20] |
| Attribution Reliability | Sanity checks reveal that many saliency maps are insensitive to model randomization [4]; Shapley-value-based explanations may not reflect true feature importance [19] | No quantitative standard for assessing attribution reliability | The evidential reliability pillar provides quantitative consistency metrics and adaptive fusion weights () based on cross-method agreement |
| Statistical Grounding | Post hoc explanations rarely validate attributions against ground-truth outcomes | Attributions may highlight features that are statistically insignificant or lack real-world relevance | The external validity pillar applies the Mann–Whitney U test with FDR correction [8,21] and effect size analysis, ensuring only statistically grounded features receive high importance |
| Uncertainty Quantification | Standard autoencoders [2,23] provide point estimates without confidence intervals; DAGMM [3] improves density estimation but lacks inference-time uncertainty | Predictions lack calibrated uncertainty, undermining trust in high-stakes decisions | Bayesian Autoencoder with Monte Carlo Dropout [5] provides distributional estimates of reconstruction errors; bootstrap resampling [24] validates stability of thresholds and metrics |
| Explanation Generation | LLMs alone [11,18] risk hallucination [16] when generating explanations from unvalidated inputs | Fluency without faithfulness; explanations may be plausible but ungrounded | Controlled generation with RAG [9,25] and CoT prompting [10] constrains LLM reasoning to verified quantitative evidence and authoritative domain knowledge |
| End-to-End Trustworthiness | Existing frameworks lack integrated verification of both reliability and validity before explanation | Trust is assumed post hoc rather than built systematically | Trust Triangle establishes a closed loop: reliability-validity verification → multi-source evidence integration → controlled generation → auditable report |
| Step | Core Work (Essence) | Alignment with Trust Triangle |
|---|---|---|
| ➊ | Feature Importance Fusion Aggregates scores from three theoretically distinct attribution methods (IG, SHAP, Perturbation) to generate a consensus-verified composite importance ranking. | Achieves Evidential Reliability. By establishing multi-method consensus, it mitigates the bias and instability inherent in any single post hoc explanation method [4,19]. This transforms the ML model’s internal reasoning into robust, reproducible quantitative evidence, forming a credible foundation for all subsequent steps [20]. |
| ➋ | New instance Risk Scoring Computes a normalized risk score by comparing the instance’s reconstruction error to a statistically derived threshold from normal behavior. | Initiates the Instantiation of External Validity. It converts the model’s raw, absolute anomaly score into a statistically grounded, interpretable relative risk measure [8,24], enabling the transition from global model assessment to individualized risk evaluation. |
| ➌ | New instance feature Impact Analysis Calculates the personalized contribution of each feature by fusing global importance, instance-specific attribution, and feature-value anomaly. | Deeply Integrates Evidential Reliability and External Validity. This step dynamically combines consensus-verified importance (reliability) with instance-level statistical anomalies (validity), creating a traceable, quantitative anchor for individualized explanations [1,7]. |
| ➍ | Fraud Rule Template Predefined, structured crime patterns, associated features, and semantic descriptions based on domain knowledge. | Establishes the Semantic Anchor for External Validity and Controlled Generation. Encoding expert knowledge into computable rules provides the necessary structure for aligning statistical evidence with actionable business logic, ensuring explanations possess inherent relevance [30]. |
| ➎ | Rule Triggering and Alerting Dynamically matches predefined fraud rules based on aggregated feature impact scores of new instance and output tiered alerts (based on: LLM Interpretation Guideline, Impact trigger) | Realizes the Business Closure of External Validity. It systematically maps quantitative evidence to comprehensible business semantics (fraud rules), generating actionable, prioritized insights and ensuring the practical utility of the explanation [21]. |
| ➏ | RAG Knowledge Retrieval Retrieves authoritative crime modus operandi details from an external knowledge base, strictly keyed by triggered Rule IDs. | Implements the Foundational Constraint for Controlled Generation. By tethering retrieval to quantifiably verified triggers, it restricts the LLM’s context to high-quality, relevant evidence, directly mitigating hallucinations [9,16]. |
| ➐ | Multi-Source Evidence Integration Aggregates all quantitative evidence (from ➊➋➌) and qualitative knowledge (from ➍➎➏) into a unified, structured input schema. | Completes the Bridge for Controlled Generation. It constructs a structured interface that forces the subsequent LLM to reason upon an integrated, auditable evidence set, enabling end-to-end traceability [10,11]. |
| ➑ | Report Generation Generates the final natural language report via an LLM driven by structured prompts and the integrated evidence. | Executes Controlled, Evidence-Based Semantic Articulation. Guided by chain-of-thought prompting [10] within the evidence-rich context, the LLM produces a coherent, audit-ready narrative that is a direct synthesis of the validated inputs, fulfilling the promise of a trustworthy explanatory system. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Shen, J.-C.; Su, N.-C.; Lin, Y.-B. Trust Triangle: A Reliability-Validity-Generation Framework for Explainable Credit Card Fraud Detection with RAG-Enhanced LLMs Reasoning. AI 2026, 7, 114. https://doi.org/10.3390/ai7030114
Shen J-C, Su N-C, Lin Y-B. Trust Triangle: A Reliability-Validity-Generation Framework for Explainable Credit Card Fraud Detection with RAG-Enhanced LLMs Reasoning. AI. 2026; 7(3):114. https://doi.org/10.3390/ai7030114
Chicago/Turabian StyleShen, Jin-Ching, Nai-Ching Su, and Yi-Bing Lin. 2026. "Trust Triangle: A Reliability-Validity-Generation Framework for Explainable Credit Card Fraud Detection with RAG-Enhanced LLMs Reasoning" AI 7, no. 3: 114. https://doi.org/10.3390/ai7030114
APA StyleShen, J.-C., Su, N.-C., & Lin, Y.-B. (2026). Trust Triangle: A Reliability-Validity-Generation Framework for Explainable Credit Card Fraud Detection with RAG-Enhanced LLMs Reasoning. AI, 7(3), 114. https://doi.org/10.3390/ai7030114
