Fuzzy Rule-Based Explanations for Tabular Black-Box Classifiers: A Comprehensive Empirical Framework with Prediction-Boundary-Aware Partitioning and Rule-Level Uncertainty Indication
Abstract
1. Introduction
Contributions
2. Related Work
2.1. Feature Attribution Methods
2.2. Rule-Based Explanation Methods
2.3. Fuzzy Systems for Explainability
2.4. Uncertainty Proxies in Explanations
2.5. Research Gap and Positioning
3. Preliminaries
3.1. Fuzzy Sets and Membership Functions
3.2. Fuzzy IF–THEN Rules
3.3. T-Norms and Their Properties
3.4. Information-Theoretic Measures
4. Proposed Framework
4.1. Fuzzy Partitioning
4.2. Prediction-Boundary-Aware Refinement
- Obtain black-box model predictions: = f(X).
- For each feature j, fit a one-dimensional decision tree Tj on (X[:,j], ) with at most K leaf nodes (equivalently, K − 1 splits).
- Extract the split thresholds {tj,1, …, tj,K − 1} from Tj, sorted in ascending order.
- Reposition membership function centers to the midpoints between consecutive split thresholds (denoting the boundaries by b_{j,k} ≡ t_{j,k} for typographic compactness in Equation (8) below, with sentinels b_{j,0} = t_{j,0} = min(X[:, j]) and b_{j,K} = t_{j,K} = max(X[:, j])):
- 5.
- Rebuild the triangular membership functions using the refined centers.
4.3. Feature Selection
4.4. Rule Extraction via Modified Wang–Mendel Procedure
- 1.
- Cell enumeration: Enumerate all Kk cells in the fuzzy grid defined by the k selected features. Each cell corresponds to a unique combination of fuzzy sets across features.
- 2.
- Soft assignment: For each cell s = (s1, …, sk) and each training sample xi, compute the membership degree using the product t-norm:
- 3.
- Support computation: The support of cell s is the sum of membership degrees across all training samples:
- 4.
- Consequent determination: For classification, the consequent is the class with the highest weighted vote:
- 5.
- Confidence computation: The confidence is the ratio of the winning class’s total membership to the cell’s total membership:
- 6.
- Filtering: Discard rules with confidence below θc (0.5 for classification, 0.3 for regression) or support below θs = max(2/N, 0.0005) (Equation (11) reports the unnormalized sum Σ_i μ(x_i) for notational compactness; this is divided by N to obtain the normalised fraction compared against θs—same N-scale as the threshold; the absolute floor 0.0005 prevents degenerate near-zero-support rules at small N).
- 7.
- Ranking and selection: Sort remaining rules by confidence × support and retain the top M rules:
- 8.
- Antecedent pruning: greedy reduction of each rule with >d_min = 3 antecedents, removing antecedents one at a time as long as the consequent is preserved and confidence drops by ≤ε = 0.005 (ε and d_min were chosen by exploratory sweep on the five-dataset ablation set: ε ∈ {0.001, 0.005, 0.01, 0.02} × d_min ∈ {2, 3, 4} grid yielded mean fidelity from 0.8920 to 0.8972 (max |Δ| = 0.42 pp vs. default 0.8930). The d_min parameter dominates: d_min = 4 consistently yields +0.2 to +0.4 pp; d_min = 2 loses ≈ 0.1 pp. ε is empirically inert across the tested range. Default d_min = 3 is retained for interpretability (one fewer antecedent per rule on average than d_min = 4); deployments prioritizing raw fidelity over rule compactness could justify d_min = 4 (Full sweep in ‘eps_dmin_sweep.csv’); rules identical after pruning are deduplicated. On the 13-dataset benchmark, this step reduces mean antecedent count from 5.49 to 3.61 and total rule count from 329.8 to 175.8 with negligible fidelity impact (<0.003). Supplementary Table S18 summarizes the four rule-count values cited in the main text by pipeline snapshot and aggregation scope.
4.5. Prediction via Weighted Voting
4.6. Rule Entropy as Uncertainty Indicator
4.7. Computational and Asymptotic Properties
5. Experiments
5.1. Experimental Setup
- Fidelity: Agreement rate between the explainer’s prediction and the black-box model’s prediction on the held-out test fold of the 3-fold CV protocol (i.e., 1/3 of the dataset per fold-evaluation, averaged across the three folds; not an internal training-set agreement)—for classification this is argmax-class accuracy, for regression this is the coefficient of determination R2 between surrogate output and model output (consistent with Supplementary Table S14). For rule-based methods only.
- Coverage: Fraction of test samples for which the explainer produces a prediction.
- Stability: Prediction agreement under small Gaussian input perturbations on standardized feature space (each feature first z-score normalized so σ = 0.01 corresponds to 1% of feature standard deviation, following the small-perturbation convention of Alvarez-Melis & Jaakkola [19] for explanation stability—single-magnitude protocol; multi-magnitude sensitivity (σ ∈ {0.01, 0.05, 0.10}) is left to future work and the single-scale limitation is flagged in Section 6.3; perturbations applied independently per feature, averaged over 100 repetitions).
- Comprehensibility: Number of rules and average number of antecedents per rule.
- Computational cost: Wall-clock time for extraction/training.
- Cross-model consistency: Whether the explainer’s behavior remains consistent across different black-box models trained on the same data.
5.2. Overall Results
5.3. Fidelity Analysis
5.4. Coverage, Stability, and Statistical Significance
5.5. Comprehensibility and Cognitive Load
5.6. Cross-Model Consistency
5.7. Computational Cost
5.8. Ablation Study
5.9. Rule Readability and Cognitive Load
5.10. Boundary Sensitivity and Rule Entropy as an Uncertainty Proxy
5.11. Local Interpretability and Rule Compactness
5.12. Comparison Against Modern Fuzzy Explanation Methods
5.13. External Validation: Multi-Class and Regression
6. Discussion
6.1. Practical Guidelines: When to Use Which Explainer?
6.2. FuzzyRules vs. TreeSurrogate: A Direct Comparison
6.3. Limitations and Threats to Validity
When the Framework Fails and Why
7. Conclusions and Future Work
7.1. Summary of Contributions
7.2. Future Work
- Rule compression via multi-objective optimization: Investigate whether NSGA-II–style multi-objective genetic selection (jointly minimizing rule count and maximizing fidelity) can compress the global rule base by ≥50% without measurable fidelity loss on the 12-classification benchmark.
- Empirical user studies: Larger-scale controlled experiments with domain experts (physicians, financial analysts) across diverse expertise levels and culturally varied populations would strengthen the generalizability of these findings.
- TSK-type fuzzy systems for regression: Adopting Takagi–Sugeno–Kang (TSK) [60] rules with linear consequent functions could significantly improve regression fidelity while maintaining linguistic antecedents.
- Multi-class extension: Our main benchmark includes one multi-class dataset (Wine, 3 classes; FuzzyRules fidelity 0.942), and external validation adds Iris (3 classes; FuzzyRules fidelity 0.973, Section 5.13). Analyzing class-specific rule entropy patterns in multi-class settings could reveal which classes the model finds most confusable, providing additional diagnostic insights.
- Online and incremental explanation: Developing mechanisms for incrementally updating the fuzzy rule base as new data arrives, enabling real-time explanation in streaming scenarios.
- Integration with fairness analysis: Examining how fuzzy rules can expose and explain potential biases in black-box models, particularly in sensitive domains such as criminal justice (COMPAS) and credit scoring.
7.3. Concluding Remarks
Supplementary Materials
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 2019, 1, 206–215. [Google Scholar] [CrossRef]
- Dressel, J.; Farid, H. The accuracy, fairness, and limits of predicting recidivism. Sci. Adv. 2018, 4, eaao5580. [Google Scholar] [CrossRef]
- Bussmann, N.; Giudici, P.; Marinelli, D.; Papenbrock, J. Explainable machine learning in credit risk management. Comput. Econ. 2021, 57, 203–216. [Google Scholar]
- Arrieta, A.B.; Díaz-Rodríguez, N.; Del Ser, J.; Bennetot, A.; Tabik, S.; Barbado, A.; Garcia, S.; Gil-Lopez, S.; Molina, D.; Benjamins, R.; et al. Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fusion 2020, 58, 82–115. [Google Scholar] [CrossRef]
- European Parliament and Council. Regulation (EU) 2016/679 (General Data Protection Regulation). Off. J. Eur. Union 2016, 59, 294. [Google Scholar]
- Goodman, B.; Flaxman, S. European Union regulations on algorithmic decision-making and a “right to explanation”. AI Mag. 2017, 38, 50–57. [Google Scholar]
- Wachter, S.; Mittelstadt, B.; Floridi, L. Why a right to explanation of automated decision-making does not exist in the General Data Protection Regulation. Int. Data Priv. Law 2017, 7, 76–99. [Google Scholar] [CrossRef]
- Lundberg, S.K.; Lee, S.-I. A unified approach to interpreting model predictions. In Proceedings of the NeurIPS, Long Beach, CA, USA, 4–9 December 2017; pp. 4766–4777. [Google Scholar]
- Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why should I trust you?”: Explaining the predictions of any classifier. In Proceedings of the ACM SIGKDD, San Francisco, CA, USA, 13–17 August 2016; pp. 1135–1144. [Google Scholar]
- Ribeiro, M.T.; Singh, S.; Guestrin, C. Anchors: High-precision model-agnostic explanations. In Proceedings of the AAAI, New Orleans, LA, USA, 2–7 February 2018; Volume 32. [Google Scholar]
- Craven, M.W.; Shavlik, J.W. Extracting tree-structured representations of trained networks. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 1995; pp. 24–30. [Google Scholar]
- Bastani, O.; Kim, C.; Bastani, H. Interpreting blackbox models via model extraction. arXiv 2017, arXiv:1705.08504. [Google Scholar]
- Zadeh, L.A. Fuzzy sets. Inf. Control 1965, 8, 338–353. [Google Scholar]
- Mamdani, E.H.; Assilian, S. An experiment in linguistic synthesis with a fuzzy logic controller. Int. J. Man-Mach. Stud. 1975, 7, 1–13. [Google Scholar] [CrossRef]
- Wang, L.-X.; Mendel, J.M. Generating fuzzy rules by learning from examples. IEEE Trans. Syst. Man Cybern. 1992, 22, 1414–1427. [Google Scholar] [CrossRef]
- Breiman, L.; Friedman, J.H.; Olshen, R.A.; Stone, C.J. Classification and Regression Trees; Wadsworth: Belmont, CA, USA, 1984. [Google Scholar]
- Fayyad, U.M.; Irani, K.B. Multi-interval discretization of continuous-valued attributes for classification learning. In Proceedings of the IJCAI, Chambéry, France, 28 August–3 September 1993; pp. 1022–1027. [Google Scholar]
- Lundberg, S.K.; Erion, G.; Chen, H.; DeGrave, A.; Prutkin, J.M.; Nair, B.; Katz, R.; Himmelfarb, J.; Bansal, N.; Lee, S.-I. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2020, 2, 56–67. [Google Scholar] [CrossRef] [PubMed]
- Alvarez-Melis, D.; Jaakkola, T.S. On the robustness of interpretability methods. In Proceedings of the ICML Workshop on Human Interpretability in Machine Learning, Stockholm, Sweden, 10–15 July 2018. [Google Scholar]
- Zafar, M.R.; Khan, N.M. DLIME: A deterministic local interpretable model-agnostic explanations approach for computer-aided diagnosis systems. arXiv 2019, arXiv:1906.10263. [Google Scholar]
- Sundararajan, M.; Taly, A.; Yan, Q. Axiomatic attribution for deep networks. In Proceedings of the ICML, Sydney, Australia, 6–11 August 2017; pp. 3319–3328. [Google Scholar]
- Shrikumar, A.; Greenside, P.; Kundaje, A. Learning important features through propagating activation differences. In Proceedings of the ICML, Sydney, Australia, 6–11 August 2017; pp. 3145–3153. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Advances in Neural Information Processing Systems; Curran Associates: Red Hook, NY, USA, 2017; pp. 5998–6008. [Google Scholar]
- Jain, S.; Wallace, B.C. Attention is not explanation. In Proceedings of the NAACL-HLT, Minneapolis, MN, USA, 2–7 June 2019; pp. 3543–3556. [Google Scholar]
- Molnar, C. Interpretable Machine Learning: A Guide for Making Black Box Models Explainable, 3rd ed.; Independently published: Munich, Germany, 2025; Available online: https://christophm.github.io/interpretable-ml-book/ (accessed on 29 April 2026).
- Guidotti, R.; Monreale, A.; Ruggieri, S.; Pedreschi, D.; Turini, F.; Giannotti, F. Local rule-based explanations of black box decision systems. arXiv 2018, arXiv:1805.10820. [Google Scholar] [CrossRef]
- Friedman, J.H.; Popescu, B.E. Predictive learning via rule ensembles. Ann. Appl. Stat. 2008, 2, 916–954. [Google Scholar] [CrossRef]
- Letham, B.; Rudin, C.; McCormick, T.H.; Madigan, D. Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model. Ann. Appl. Stat. 2015, 9, 1350–1371. [Google Scholar] [CrossRef]
- Ishibuchi, H.; Nakashima, T.; Murata, T. Performance evaluation of fuzzy classifier systems for multidimensional pattern classification problems. IEEE Trans. Syst. Man Cybern. Part B 1999, 29, 601–618. [Google Scholar] [CrossRef]
- Casillas, J.; Cordón, O.; Herrera, F.; Magdalena, L. Interpretability Issues in Fuzzy Modeling; Studies in Fuzziness and Soft Computing; Springer: Berlin, Germany, 2003; Volume 128. [Google Scholar]
- Guillaume, S. Designing fuzzy inference systems from data: An interpretability-oriented review. IEEE Trans. Fuzzy Syst. 2001, 9, 426–443. [Google Scholar] [CrossRef]
- Ishibuchi, H.; Yamamoto, T. Rule weight specification in fuzzy rule-based classification systems. IEEE Trans. Fuzzy Syst. 2005, 13, 428–435. [Google Scholar] [CrossRef]
- Herrera, F. Genetic fuzzy systems: Taxonomy, current research trends and prospects. Evol. Intell. 2008, 1, 27–46. [Google Scholar] [CrossRef]
- Alcalá-Fdez, J.; Alcalá, R.; Herrera, F. A fuzzy association rule-based classification model for high-dimensional problems with genetic rule selection and lateral tuning. IEEE Trans. Fuzzy Syst. 2011, 19, 857–872. [Google Scholar] [CrossRef]
- Setiono, R.; Leow, W.K. FERNN: An algorithm for fast extraction of rules from neural networks. Appl. Intell. 2000, 12, 15–25. [Google Scholar] [CrossRef]
- Huysmans, J.; Dejaeger, K.; Mues, C.; Vanthienen, J.; Baesens, B. An empirical evaluation of the comprehensibility of decision table, tree and rule based predictive models. Decis. Support Syst. 2011, 51, 141–154. [Google Scholar] [CrossRef]
- Pancho, D.P.; Alonso, J.M.; Cordón, O.; Quirin, A.; Magdalena, L. Fingrams: Visual representations of fuzzy rule-based inference for expert analysis of comprehensibility. IEEE Trans. Fuzzy Syst. 2013, 21, 1133–1149. [Google Scholar] [CrossRef]
- Loyola-González, O.; Medina-Pérez, M.A.; Martínez-Trinidad, J.F.; Carrasco-Ochoa, J.A.; Monroy, R.; García-Borroto, M. PBC4cip: A new contrast pattern-based classifier for class imbalance problems. Knowl.-Based Syst. 2017, 115, 100–109. [Google Scholar]
- Mendel, J.M. Explainable Uncertain Rule-Based Fuzzy Systems, 3rd ed.; Springer: Cham, Switzerland, 2024. [Google Scholar] [CrossRef]
- Pickering, L.; Cohen, K.; De Baets, B. A Narrative Review on the Interpretability of Fuzzy Rule-Based Models from a Modern Interpretable Machine Learning Perspective. Int. J. Fuzzy Syst. 2025, in press. [Google Scholar] [CrossRef]
- Pedrycz, W. (Ed.) Machine Learning and Granular Computing: A Synergistic Design Environment; Springer: Cham, Switzerland, 2024. [Google Scholar] [CrossRef]
- Alateeq, M.; Pedrycz, W. Logic-oriented fuzzy neural networks: A survey. Expert Syst. Appl. 2024, 257, 125120. [Google Scholar] [CrossRef]
- Gal, Y.; Ghahramani, Z. Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. In Proceedings of the ICML, New York, NY, USA, 19–24 June 2016; pp. 1050–1059. [Google Scholar]
- Gal, Y.; Ghahramani, Z. A theoretically grounded application of dropout in recurrent neural networks. In Advances in Neural Information Processing Systems; Curran Associates: Red Hook, NY, USA, 2016; pp. 1019–1027. [Google Scholar]
- Vovk, V.; Gammerman, A.; Shafer, G. Algorithmic Learning in a Random World; Springer: New York, NY, USA, 2005. [Google Scholar]
- Blundell, C.; Cornebise, J.; Kavukcuoglu, K.; Wierstra, D. Weight uncertainty in neural networks. In Proceedings of the ICML, Lille, France, 6–11 July 2015; pp. 1613–1622. [Google Scholar]
- Slack, D.; Hilgard, S.; Jia, E.; Singh, S.; Lakkaraju, H. Fooling LIME and SHAP: Adversarial attacks on post hoc explanation methods. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, New York, NY, USA, 7–8 February 2020; pp. 180–186. [Google Scholar]
- Zhang, Y.; Song, K.; Sun, Y.; Tan, S.; Udell, M. “Why should you trust my explanation?” Understanding uncertainty in LIME explanations. arXiv 2019, arXiv:1904.12991. [Google Scholar]
- Vilone, G.; Longo, L. Notions of explainability and evaluation approaches for explainable artificial intelligence. Inf. Fusion 2021, 76, 89–106. [Google Scholar] [CrossRef]
- Guidotti, R.; Monreale, A.; Ruggieri, S.; Turini, F.; Giannotti, F.; Pedreschi, D. A survey of methods for explaining black box models. ACM Comput. Surv. 2018, 51, 1–42. [Google Scholar] [CrossRef]
- Zhu, X.; Wang, D.; Pedrycz, W.; Li, Z. Fuzzy Rule-Based Local Surrogate Models for Black-Box Model Explanation. IEEE Trans. Fuzzy Syst. 2023, 31, 2056–2064. [Google Scholar] [CrossRef]
- Ouifak, H.; Idri, A. A comprehensive review of fuzzy logic based interpretability and explainability of machine learning techniques across domains. Neurocomputing 2025, 647, 130602. [Google Scholar] [CrossRef]
- Klement, E.P.; Mesiar, R.; Pap, E. Triangular Norms; Trends in Logic; Springer: Dordrecht, The Netherlands, 2000; Volume 8. [Google Scholar]
- Miller, G.A. The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychol. Rev. 1956, 63, 81–97. [Google Scholar] [CrossRef]
- Angwin, J.; Larson, J.; Mattu, S.; Kirchner, L. Machine bias: There’s Software Used Across the Country to Predict Future Criminals. And It’s Biased Against Blacks. ProPublica, 23 May 2016. Available online: https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing (accessed on 29 April 2026).
- Demšar, J. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 2006, 7, 1–30. [Google Scholar]
- Schuirmann, D.J. A comparison of the two one-sided tests procedure and the power approach for assessing the equivalence of average bioavailability. J. Pharmacokinet. Biopharm. 1987, 15, 657–680. [Google Scholar] [CrossRef] [PubMed]
- Cowan, N. The magical number 4 in short-term memory: A reconsideration of mental storage capacity. Behav. Brain Sci. 2001, 24, 87–114. [Google Scholar] [CrossRef] [PubMed]
- Lipton, Z.C. The mythos of model interpretability. Queue 2018, 16, 31–57. [Google Scholar] [CrossRef]
- Takagi, T.; Sugeno, M. Fuzzy identification of systems and its applications to modeling and control. IEEE Trans. Syst. Man Cybern. 1985, SMC-15, 116–132. [Google Scholar]
- Harrison, D.; Rubinfeld, D.L. Hedonic housing prices and the demand for clean air. J. Environ. Econ. Manag. 1978, 5, 81–102. [Google Scholar] [CrossRef]
- Dua, D.; Graff, C. UCI Machine Learning Repository. University of California, Irvine, School of Information and Computer Sciences. 2017. Available online: https://archive.ics.uci.edu/ml (accessed on 29 April 2026).
- Smith, J.W.; Everhart, J.E.; Dickson, W.C.; Knowler, W.C.; Johannes, R.S. Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. In Proceedings of the Annual Symposium on Computer Application in Medical Care; IEEE Computer Society Press: Washington, DC, USA, 1988; pp. 261–265. [Google Scholar]
- Efron, B.; Hastie, T.; Johnstone, I.; Tibshirani, R. Least angle regression. Ann. Stat. 2004, 32, 407–499. [Google Scholar] [CrossRef]
- Cover, T.M.; Thomas, J.A. Elements of Information Theory, 2nd ed.; Wiley-Interscience: Hoboken, NJ, USA, 2006. [Google Scholar]
- van der Vaart, A.W. Asymptotic Statistics; Cambridge University Press: Cambridge, UK, 2000. [Google Scholar]
- Devroye, L.; Györfi, L.; Lugosi, G. A Probabilistic Theory of Pattern Recognition; Springer: New York, NY, USA, 1996. [Google Scholar]






| Method | Global | Symbolic | Linguistic | Uncertainty | Coverage | Model-Agnostic |
|---|---|---|---|---|---|---|
| SHAP | ✗ | ✗ | ✗ | ✗ | ✓ | ✓ |
| LIME | ✗ | ~ | ✗ | ✗ | ✓ | ✓ |
| Anchors | ✗ | ✓ | ✗ | ✗ | ~ | ✓ |
| TreeSurr | ✓ | ✓ | ✗ | ✗ | ✓ | ✓ |
| FuzzyRules (Ours) | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| Study | vs. SHAP | vs. LIME | vs. Anchors | vs. Tree | #Datasets | Fidelity | Stat. Test |
|---|---|---|---|---|---|---|---|
| Ishibuchi et al. [28] | ✗ | ✗ | ✗ | ✗ | 4 | ✗ | ✗ |
| Setiono & Leow [32] | ✗ | ✗ | ✗ | ✗ | 3 | ✓ | ✗ |
| Pancho et al. [34] | ✗ | ✗ | ✗ | ✗ | 2 | ✗ | ✗ |
| Zhu et al. [51] | ✗ | ✓ | ✗ | ✗ | 3 | ✓ | ✗ |
| Vilone & Longo [52] * | ✗ | ✗ | ✗ | ✗ | — | ✗ | ✗ |
| Ouifak & Idri [53] * | ✗ | ✗ | ✗ | ✗ | — | ✗ | ✗ |
| This paper | ✓ | ✓ | ✓ | ✓ | 13 | ✓ | ✓ |
| Dataset | Domain | N | D | Task | Source |
|---|---|---|---|---|---|
| Adult Income | Finance | 48,842 | 14 | Classification | UCI [46] |
| Bank Marketing | Finance | 45,211 | 16 | Classification | UCI [46] |
| Breast Cancer | Healthcare | 569 | 30 | Classification | UCI [46] |
| COMPAS | Justice | 6172 | 11 | Classification | ProPublica [47] |
| Default Credit | Finance | 30,000 | 23 | Classification | UCI [46] |
| German Credit | Finance | 1000 | 20 | Classification | UCI [46] |
| Heart Disease | Healthcare | 303 | 13 | Classification | UCI [46] |
| Ionosphere | Physics | 351 | 34 | Classification | UCI [46] |
| Magic Gamma | Physics | 19,020 | 10 | Classification | UCI [46] |
| Pima Diabetes | Healthcare | 768 | 8 | Classification | [50] |
| Spambase | Cybersecurity | 4601 | 57 | Classification | UCI [46] |
| Wine | Food Science | 178 | 13 | Classification | UCI [46] |
| Diabetes | Healthcare | 442 | 10 | Regression | [49] |
| Method | Fidelity [95% CI] | Coverage [95% CI] | Stability [95% CI] | Rules | Avg Antecedents | Fidelity × Coverage |
|---|---|---|---|---|---|---|
| FuzzyRules (Ours) | 0.878 [0.844, 0.910] | 0.9996 [0.9990, 1.0000] | 0.986 [0.971, 0.996] | 175.8 | 3.61 | 0.878 |
| TreeSurrogate | 0.890 [0.858, 0.919] | 1.0000 [1.0000, 1.0000] | 0.994 [0.986, 0.999] | 17.3 | 3.96 | 0.890 |
| Anchors | 0.963 † [0.946, 0.980] | 0.228 [0.120, 0.374] | 0.974 [0.959, 0.987] | 5.0 | 2.87 | 0.220 |
| SHAP | — | 1.000 | — ‡ | — | — | — |
| LIME | — | 1.000 | — ‡ | — | — | — |
| Method | Type | Mean (s) | Std (s) | Median (s) |
|---|---|---|---|---|
| FuzzyRules (Ours) | Global | 4.728 | 8.448 | 0.285 |
| TreeSurrogate | Global | 0.366 | 0.483 | 0.198 |
| SHAP | Local | 0.079 | 0.252 | 0.050 |
| LIME | Local | 0.035 | 0.030 | 0.020 |
| Anchors | Local | 8.539 | 15.655 | 3.345 |
| Dataset | Vanilla WM | K = 2 | Full (Ours) | Min T-Norm | No Feat. Sel. |
|---|---|---|---|---|---|
| Adult Income | 0.855 | 0.870 | 0.936 | 0.910 | 0.904 |
| Breast Cancer | 0.415 | 0.954 | 0.958 | 0.847 | — |
| German Credit | 0.836 | 0.871 | 0.851 | 0.842 | — |
| Heart Disease | 0.859 | 0.886 | 0.892 | 0.879 | — |
| Magic Gamma | 0.717 | 0.769 | 0.829 | 0.809 | 0.833 |
| Mean | 0.736 | 0.870 | 0.893 | 0.857 | 0.869 |
| Dataset | Percentile | Equal-Width | K-Means | Boundary-Aware (Ours) |
|---|---|---|---|---|
| Adult Income | 0.883 | 0.882 | 0.896 | 0.936 |
| Breast Cancer | 0.956 | 0.938 | 0.954 | 0.958 |
| German Credit | 0.853 | 0.847 | 0.843 | 0.851 |
| Heart Disease | 0.879 | 0.889 | 0.882 | 0.892 |
| Magic Gamma | 0.768 | 0.787 | 0.856 | 0.829 |
| Mean | 0.868 | 0.869 | 0.886 | 0.893 |
| Hyperparameter | Values Tested | Mean Fidelity | Conclusion |
|---|---|---|---|
| Number of fuzzy sets K | 2/3/4/5 | 0.870/0.893/0.898/0.910 | Monotone gain; K = 3 is the interpretability sweet spot (Miller chunks) |
| Min confidence threshold | 0.3/0.4/0.5/0.6/0.7 | 0.893/0.893/0.893/0.893/0.896 | Insensitive across the [0.3, 0.7] range |
| Top-k selected features | 3/5/7 | 0.872/0.894/0.895 | Diminishing returns past k = 5; k = 7 is a safe default |
| Vote weighting | conf 1/conf 2/conf 1·support | 0.893/0.893/0.885 | Linear and squared weighting equivalent; support-weighted slightly worse |
| FuzzyRules | TreeSurrogate |
|---|---|
| IF cp IS High AND oldpeak IS Medium AND thal IS High THEN Class 1 (conf = 0.943) | IF thal ≤ −0.16 AND ca ≤ −0.20 AND trestbps ≤ 1.38 AND age ≤ 0.64 THEN Class 0 (covers 61 samples) |
| IF cp IS Medium AND oldpeak IS Low AND thal IS Low THEN Class 0 (conf = 0.942) | IF thal > −0.16 AND cp > 0.33 AND oldpeak > −0.52 AND thal > 0.87 THEN Class 1 (covers 49 samples) |
| Dataset | Prec. | Recall | Err Rate | Enrichment |
|---|---|---|---|---|
| Breast Cancer | 0.153 | 0.956 | 0.035 | 4.36×|F1 = 0.264 |
| Heart Disease | 0.424 | 0.429 | 0.185 | 2.29×|F1 = 0.427 |
| Wine | 0.125 | 0.889 | 0.017 | 7.41×|F1 = 0.219 |
| Spambase | 0.171 | 0.629 | 0.058 | 2.93×|F1 = 0.269 |
| Adult Income | 0.424 | 0.540 | 0.158 | 2.69×|F1 = 0.475 |
| Ionosphere | 0.111 | 0.718 | 0.077 | 1.44×|F1 = 0.192 |
| Pima Diabetes | 0.513 | 0.887 | 0.238 | 2.15×|F1 = 0.650 |
| German Credit | 0.579 | 0.834 | 0.232 | 2.50×|F1 = 0.684 |
| Magic Gamma | 0.216 | 0.840 | 0.132 | 1.64×|F1 = 0.343 |
| Bank Marketing | 0.397 | 0.418 | 0.106 | 3.76×|F1 = 0.407 |
| COMPAS | 0.620 | 0.689 | 0.324 | 1.91×|F1 = 0.653 |
| Default Credit | 0.348 | 0.895 | 0.179 | 1.94×|F1 = 0.501 |
| Average | 0.340 | 0.727 | 2.92×|macro-F1 = 0.424 |
| Dimension | FuzzyRules | TreeSurrogate | Winner |
|---|---|---|---|
| Fidelity (clf, untuned) | 0.889 | 0.900 | ≈(p_Holm = 0.733; TOST p = 0.002; |d| = 0.29) |
| Fidelity (all, untuned) | 0.878 | 0.890 | ≈(TOST p < 0.001; |d| = 0.33) |
| Fidelity (tuned, ablation) | 0.902 | 0.893 | FuzzyRules (+0.9 pp; Section 5.13) |
| Coverage | 0.9996 | 1.0000 | ≈ |
| Stability | 0.986 | 0.994 | ≈(p_W = 0.131; TOST p < 0.001, δ = 0.05) |
| Cognitive load (chunks) | 3.6 | 8.6 | FuzzyRules (2.4×) |
| Within Miller’s limit | 12/12 | 7/12 | FuzzyRules |
| Linguistic labels | Yes | No | FuzzyRules |
| Standardization-free | Yes | No | FuzzyRules |
| Uncertainty signal | Rule entropy | None | FuzzyRules |
| Simulatability (K = 1) | 99% retention | N/A | FuzzyRules |
| Training time | 4.728 s | 0.366 s | TreeSurrogate (13×) |
| Global rule count | 175.8 | 17.3 | TreeSurrogate |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Tekin, A.T. Fuzzy Rule-Based Explanations for Tabular Black-Box Classifiers: A Comprehensive Empirical Framework with Prediction-Boundary-Aware Partitioning and Rule-Level Uncertainty Indication. Appl. Sci. 2026, 16, 5896. https://doi.org/10.3390/app16125896
Tekin AT. Fuzzy Rule-Based Explanations for Tabular Black-Box Classifiers: A Comprehensive Empirical Framework with Prediction-Boundary-Aware Partitioning and Rule-Level Uncertainty Indication. Applied Sciences. 2026; 16(12):5896. https://doi.org/10.3390/app16125896
Chicago/Turabian StyleTekin, Ahmet Tezcan. 2026. "Fuzzy Rule-Based Explanations for Tabular Black-Box Classifiers: A Comprehensive Empirical Framework with Prediction-Boundary-Aware Partitioning and Rule-Level Uncertainty Indication" Applied Sciences 16, no. 12: 5896. https://doi.org/10.3390/app16125896
APA StyleTekin, A. T. (2026). Fuzzy Rule-Based Explanations for Tabular Black-Box Classifiers: A Comprehensive Empirical Framework with Prediction-Boundary-Aware Partitioning and Rule-Level Uncertainty Indication. Applied Sciences, 16(12), 5896. https://doi.org/10.3390/app16125896

