Towards Faithful Local Explanations: Leveraging SVM to Interpret Black-Box Machine Learning Models
Abstract
1. Introduction
1.1. Explainable Machine Learning
1.2. Existing Efforts on Local Explanations
1.3. Challenges and Motivation
- This paper presents SVM-X, a local explanation method that can fit the decision boundary of a black-box ML model with a more reasonable explanation of the prediction on the given record.
- This paper proposes a measure of the distance between two data records based on the model prediction output, by which SVM-X can effectively improve the fidelity of the local explanation model.
- This paper evaluated the performance of SVM-X on five different types of ML models. Extensive experiments with comparisons to four baselines demonstrate that the local interpretation model constructed by SVM-X has a higher fidelity and its explanation can achieve higher stability.
2. Preliminary
2.1. Support Vector Machine
2.2. Linear Model-Based Explanation
3. Design of SVM-X
- (1)
- Local sample generation: Given a target record selected from the training data, this step generates a local interpretation training set by perturbing the feature values of to create a set of neighborhood samples. Each perturbed sample is passed through the target model to obtain its prediction. The resulting set of perturbed samples and their associated outputs from form the local training data.
- (2)
- Local model construction: Using the locally generated samples and their corresponding predictions, SVM-X trains a linear Support Vector Machine (SVM) to approximate the target model’s behavior around . The linear SVM serves as a surrogate model that mimics the decision boundary of in the neighborhood of . Due to its linearity, the SVM model is highly interpretable; its weight vector directly reflects the influence of each feature.
- (3)
- Explainable weight extraction: Once the linear SVM is trained, SVM-X extracts the weight vector w from the model. Since the prediction function of a linear SVM is , the magnitude of each weight indicates the importance of feature i. These values are normalized to form the explanation vector , which serves as a feature attribution for the prediction made by on .
3.1. Local Sample Generation
3.2. Local Model Construction
3.3. Explainable Weight Extraction
Algorithm 1 Explanations using SVM-X | |
Input: Target model , Target record , Number of samples N, Weight function , Length of explanation K | |
Output: Explainable features w (top K features with their weights) | |
| |
| ▹ Generate local samples |
| |
| ▹ Calculate sample weight using distance-based weighting |
| ▹ Get prediction from the target model for the sample |
| ▹ Add sample, prediction, and weight to the dataset |
| |
| ▹ Fit local model |
| ▹ Extract K explainable features |
| ▹ Sort features by the absolute values of their weights |
| |
|
4. Results
4.1. Experiment Setup
4.1.1. Dataset
4.1.2. Target Model
- Logistic Regression (LR) [19]: A linear model used for binary classification that estimates probabilities using the logistic function.
- Random Forest (RF) [20]: An ensemble learning method that builds multiple decision trees and aggregates their predictions to improve accuracy and control overfitting.
- XGBoost (XGB) [21]: A gradient boosting framework that uses decision trees as base learners and optimizes performance with regularization techniques to prevent overfitting.
- Decision Tree (DT) [22]: A tree-based model that splits the data into subsets using feature thresholds to create a model that can be interpreted visually.
- Deep Neural Network (DNN) [23]: A type of artificial neural network with multiple layers, capable of learning complex patterns in large datasets through backpropagation.
4.1.3. Evaluation Metrics
4.1.4. Comparison Methods
4.2. Performance of SVM-X
4.3. Impact of the Number of Training Data
4.4. Weight Stability of Local Interpretation Models
4.5. Fidelity of Local Interpretation Models
5. Related Work
5.1. Model-Specific Explanation
5.2. Model-Agnostic Explanation
5.3. Explanation for SVM
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Zhai, X.; Kolesnikov, A.; Houlsby, N.; Beyer, L. Scaling vision transformers. In Proceedings of the IEEE/CVF CVPR, New Orleans, LA, USA, 18–22 June 2022; pp. 12104–12113. [Google Scholar]
- Min, B.; Ross, H.; Sulem, E.; Veyseh, A.P.B.; Nguyen, T.H.; Sainz, O.; Agirre, E.; Heintz, I.; Roth, D. Recent advances in natural language processing via large pre-trained language models: A survey. ACM Comput. Surv. 2023, 56, 1–40. [Google Scholar] [CrossRef]
- Rawal, A.; McCoy, J.; Rawat, D.B.; Sadler, B.M.; Amant, R.S. Recent advances in trustworthy explainable artificial intelligence: Status, challenges, and perspectives. IEEE Trans. Artif. Intell. 2022, 3, 852–866. [Google Scholar] [CrossRef]
- Zou, L.; Goh, H.L.; Liew, C.J.Y.; Quah, J.L.; Gu, G.T.; Chew, J.J.; Kumar, M.P.; Ang, C.G.L.; Ta, A.W.A. Ensemble image explainable AI (XAI) algorithm for severe community-acquired pneumonia and COVID-19 respiratory infections. IEEE Trans. Artif. Intell. 2023, 4, 242–254. [Google Scholar] [CrossRef]
- Saleem, R.; Yuan, B.; Kurugollu, F.; Anjum, A.; Liu, L. Explaining deep neural networks: A survey on the global interpretation methods. Neurocomputing 2022, 513, 165–180. [Google Scholar] [CrossRef]
- Yeh, C.K.; Kim, B.; Arik, S.; Li, C.L.; Pfister, T.; Ravikumar, P. On Completeness-aware Concept-Based Explanations in Deep Neural Networks. In Proceedings of the NeurIPS, Virtual, 6–12 December 2020; Volume 33, pp. 20554–20565. [Google Scholar]
- Ghorbani, A.; Wexler, J.; Zou, J.Y.; Kim, B. Towards Automatic Concept-based Explanations. In Proceedings of the NeurIPS, Vancouver, CA, Canada, 8–14 December 2019; Volume 32. [Google Scholar]
- Yash, G.; Amir, F.; Uri, S.; Been, K. Explaining classifiers with causal concept effect (cace). arXiv 2019, arXiv:1907.07165. [Google Scholar]
- Lundberg, S.M.; Erion, G.; Chen, H.; DeGrave, A.; Prutkin, J.M.; Nair, B.; Katz, R.; Himmelfarb, J.; Bansal, N.; Lee, S.I. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2020, 2, 56–67. [Google Scholar] [CrossRef]
- Ribeiro, M.T.; Singh, S.; Guestrin, C. Why should I trust you? Explaining the predictions of any classifier. In Proceedings of the ACM SIGKDD, San Francisco, CA, USA, 13–17 August 2016; pp. 1135–1144. [Google Scholar]
- Baehrens, D.; Schroeter, T.; Harmeling, S.; Kawanabe, M.; Hansen, K.; Müller, K.R. How to explain individual classification decisions. J. Mach. Learn. Res. 2010, 11, 1803–1831. [Google Scholar]
- Strumbelj, E.; Kononenko, I. An efficient explanation of individual classifications using game theory. J. Mach. Learn. Res. 2010, 11, 1–18. [Google Scholar]
- Guo, W.; Mu, D.; Xu, J.; Su, P.; Wang, G.; Xing, X. Lemna: Explaining deep learning based security applications. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, Toronto, ON, Canada, 15–19 October 2018; pp. 364–379. [Google Scholar]
- Ribeiro, M.T.; Singh, S.; Guestrin, C. Anchors: High-Precision Model-Agnostic Explanations. In Proceedings of the AAAI, New Orleans, LO, USA, 2–7 February 2018; Volume 18, pp. 1527–1535. [Google Scholar]
- He, Y.; Lou, J.; Qin, Z.; Ren, K. FINER: Enhancing State-of-the-art Classifiers with Feature Attribution to Facilitate Security Analysis. In Proceedings of the 2023 ACM SIGSAC CCS, Copenhagen, Denmark, 26–30 November 2023; pp. 416–430. [Google Scholar]
- Blitzer, J.; Dredze, M.; Pereira, F. Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In Proceedings of the ACL, Prague, Czech Republic, 23–27 June 2007; pp. 440–447. [Google Scholar]
- Kohavi, R. Scaling up the accuracy of naive-bayes classifiers: A decision-tree hybrid. In Proceedings of the ACM SIGKDD, Portland, OR, USA, 2–4 August 1996; Volume 96, pp. 202–207. [Google Scholar]
- Moro, S.; Laureano, R.; Cortez, P. Using data mining for bank direct marketing: An application of the crisp-dm methodology. EUROSIS-ETI 2011. Available online: https://hdl.handle.net/1822/14838 (accessed on 11 June 2025).
- Wright, R.E. Logistic Regression; American Psychological Association: Washington, DC, USA, 1995. [Google Scholar]
- Svetnik, V.; Liaw, A.; Tong, C.; Culberson, J.C.; Sheridan, R.P.; Feuston, B.P. Random forest: A classification and regression tool for compound classification and QSAR modeling. J. Chem. Inf. Comput. Sci. 2003, 43, 1947–1958. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the ACM SIGKDD, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
- Safavian, S.R.; Landgrebe, D. A survey of decision tree classifier methodology. IEEE Trans. Syst. Man, Cybern. 1991, 21, 660–674. [Google Scholar] [CrossRef]
- Liu, W.; Wang, Z.; Liu, X.; Zeng, N.; Liu, Y.; Alsaadi, F.E. A survey of deep neural network architectures and their applications. Neurocomputing 2017, 234, 11–26. [Google Scholar] [CrossRef]
- Glenn, F.; Sathyakama, S.; R Bharat, R. Rule Extraction from Linear Support Vector Machines. In Proceedings of the ACM SIGKDD, Chicago, IL, USA, 21–24 August 2005; pp. 32–40. [Google Scholar]
- Wachter, S.; Mittelstadt, B.; Russell, C. Counterfactual explanations without opening the black box: Automated decisions and the GDPR. Harv. JL Tech. 2017, 31, 841. [Google Scholar] [CrossRef]
- Dwivedi, R.; Dave, D.; Naik, H.; Singhal, S.; Omer, R.; Patel, P.; Qian, B.; Wen, Z.; Shah, T.; Morgan, G.; et al. Explainable AI (XAI): Core ideas, techniques, and solutions. ACM Comput. Surv. 2023, 55, 1–33. [Google Scholar] [CrossRef]
- Andrews, R.; Diederich, J.; Tickle, A.B. Survey and critique of techniques for extracting rules from trained artificial neural networks. Knowl.-Based Syst. 1995, 8, 373–389. [Google Scholar] [CrossRef]
- Averkin, A.; Yarushev, S. Fuzzy rules extraction from deep neural networks. In Proceedings of the EEKM, Moscow, Russia, 8–9 December 2021. [Google Scholar]
- Hailesilassie, T. Rule extraction algorithm for deep neural networks: A review. arXiv 2016, arXiv:1610.05267. [Google Scholar]
- Zilke, J.R.; Mencía, E.L.; Janssen, F. Deepred–rule extraction from deep neural networks. In Proceedings of the ICDS, Venice, Italy, 24–28 April 2016; pp. 457–473. [Google Scholar]
- Zeiler, M.D.; Fergus, R. Visualizing and understanding convolutional networks. In Proceedings of the ECCV, Zurich, Switzerland, 6–12 September 2014; pp. 818–833. [Google Scholar]
- Shrikumar, A.; Greenside, P.; Kundaje, A. Learning important features through propagating activation differences. arXiv 2017, arXiv:1704.02685. [Google Scholar]
- Zhou, B.; Khosla, A.; Lapedriza, A.; Oliva, A.; Torralba, A. Learning deep features for discriminative localization. In Proceedings of the IEEE CVPR, Las Vegas, NV, USA, 27–30 June 2016; pp. 2921–2929. [Google Scholar]
- Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE ICCV, Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]
- Smilkov, D.; Thorat, N.; Kim, B.; Viégas, F.; Wattenberg, M. Smoothgrad: Removing noise by adding noise. arXiv 2017, arXiv:1706.03825. [Google Scholar]
- Adebayo, J.; Gilmer, J.; Muelly, M.; Goodfellow, I.; Hardt, M.; Kim, B. Sanity checks for saliency maps. Adv. Neural Inf. Process. Syst. 2018, 31, 1–11. [Google Scholar]
- Kindermans, P.J.; Hooker, S.; Adebayo, J.; Alber, M.; Schütt, K.T.; Dähne, S.; Erhan, D.; Kim, B. The (un) reliability of saliency methods. In Explainable AI: Interpreting, Explaining and Visualizing Deep Learning; Springer: Berlin/Heidelberg, Germany, 2019; pp. 267–280. [Google Scholar]
- Jin, W.; Li, X.; Hamarneh, G. One map does not Fit all: Evaluating saliency map explanation on multi-modal medical images. arXiv 2021, arXiv:2107.05047. [Google Scholar]
- Mekonnen, E.T.; Longo, L.; Dondio, P. A global model-agnostic rule-based XAI method based on Parameterized Event Primitives for time series classifiers. Front. Artif. Intell. 2024, 7, 1381921. [Google Scholar] [CrossRef] [PubMed]
- Mamalakis, M.; Mamalakis, A.; Agartz, I.; Mørch-Johnsen, L.E.; Murray, G.; Suckling, J.; Lio, P. Solving the enigma: Deriving optimal explanations of deep networks. arXiv 2024, arXiv:2405.10008. [Google Scholar]
- Botari, T.; Izbicki, R.; de Carvalho, A.C. Local Interpretation Methods to Machine Learning Using the Domain of the Feature Space. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Würzburg, Germany, 16–20 September 2019; pp. 241–252. [Google Scholar]
- Negri, F.R.; Nicolosi, N.; Camilli, M.; Mirandola, R. Explanation-driven Self-adaptation using Model-agnostic Interpretable Machine Learning. In Proceedings of the 19th International Symposium on Software Engineering for Adaptive and Self-Managing Systems, Lisbon, Portugal, 15–16 April 2024; pp. 189–199. [Google Scholar]
- Pira, L.; Ferrie, C. On the interpretability of quantum neural networks. Quantum Mach. Intell. 2024, 6, 52. [Google Scholar] [CrossRef]
- Delaunay, J. Explainability for Machine Learning Models: From Data Adaptability to User Perception. Ph.D. Thesis, Université de Rennes, Rennes, France, 2023. [Google Scholar]
- Whitten, P.; Wolff, F.; Papachristou, C. An AI Architecture with the Capability to Explain Recognition Results. arXiv 2024, arXiv:2406.08740. [Google Scholar]
- Namrata, S.; Pradeep, S.; Deepika, B. A rule extraction approach from support vector machines for diagnosing hypertension among diabetics. Expert Syst. Appl. 2019, 130, 188–205. [Google Scholar]
- Zhu, P.; Hu, Q. Rule extraction from support vector machines based on consistent region covering reduction. Knowl.-Based Syst. 2013, 42, 1–8. [Google Scholar] [CrossRef]
- Farquad, M.A.H.; Ravi, V.; Raju, S.B. Churn prediction using comprehensible support vector machine: An analytical CRM application. Appl. Soft Comput. J. 2014, 19, 31–40. [Google Scholar] [CrossRef]
- Shakerin, F.; Gupta, G. White-box induction from SVM models: Explainable ai with logic programming. Theory Pract. Log. Program. 2020, 20, 656–670. [Google Scholar] [CrossRef]
Dataset | Methods | LR | DT | RF | XGB | DNN |
---|---|---|---|---|---|---|
Adult | LIME | 0.1273 | 0.1090 | 0.1377 | 0.0674 | 0.1143 |
LEMNA | 0.0581 | 0.0256 | 0.0823 | 0.0206 | 0.0357 | |
SVM-X | 0.0446 | 0.0718 | 0.0614 | 0.0172 | 0.0303 | |
Book | LIME | 0.1507 | 0.1504 | 0.1520 | 0.1428 | |
LEMNA | 0.1382 | 0.1264 | 0.1522 | 0.1478 | 0.1479 | |
SVM-X | 0.1218 | 0.0905 | 0.1034 | 0.1205 | 0.1309 | |
DVD | LIME | 0.1315 | 0.1203 | 0.1598 | 0.1359 | 0.1386 |
LEMNA | 0.1563 | 0.1319 | 0.1389 | 0.1407 | 0.1319 | |
SVM-X | 0.1383 | 0.0929 | 0.0489 | 0.0822 | 0.0562 |
Method | Model-Agnostic | Local Fidelity | Stability | Handling Nonlinearity |
---|---|---|---|---|
LIME [10] | Yes | Low | Moderate | Poor |
LEMNA [13] | Yes | Moderate | Low | Moderate |
Anchor [14] | Yes | Moderate | High | Poor |
Rule-based SVM [24] | No | High | High | Limited |
Example-based SVM [25] | No | High | Moderate | Limited |
SVM-X (Ours) | Yes | High | High | Good |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Xu, J.; Zhang, Z.; Wang, J.; Ouyang, B.; Zhou, B.; Zhao, J.; Ge, H.; Xu, B. Towards Faithful Local Explanations: Leveraging SVM to Interpret Black-Box Machine Learning Models. Symmetry 2025, 17, 950. https://doi.org/10.3390/sym17060950
Xu J, Zhang Z, Wang J, Ouyang B, Zhou B, Zhao J, Ge H, Xu B. Towards Faithful Local Explanations: Leveraging SVM to Interpret Black-Box Machine Learning Models. Symmetry. 2025; 17(6):950. https://doi.org/10.3390/sym17060950
Chicago/Turabian StyleXu, Jiaxiang, Zhanhao Zhang, Junfei Wang, Biao Ouyang, Benkuan Zhou, Jianxiong Zhao, Hanfang Ge, and Bo Xu. 2025. "Towards Faithful Local Explanations: Leveraging SVM to Interpret Black-Box Machine Learning Models" Symmetry 17, no. 6: 950. https://doi.org/10.3390/sym17060950
APA StyleXu, J., Zhang, Z., Wang, J., Ouyang, B., Zhou, B., Zhao, J., Ge, H., & Xu, B. (2025). Towards Faithful Local Explanations: Leveraging SVM to Interpret Black-Box Machine Learning Models. Symmetry, 17(6), 950. https://doi.org/10.3390/sym17060950