Shapley Feature Selection
Abstract
:1. Introduction
2. Methods
2.1. Data
2.2. Models
2.2.1. LightGBM
2.2.2. SHAP
2.3. Feature Selection
2.3.1. Stepwise Feature Selection
2.3.2. LASSO
2.3.3. BORUTA
3. Results
4. Conclusions and Future Works
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Subramanian, D.; Greiner, R.; Pearl, J. Land Economics. Relevance 1997, 97, 1–2. [Google Scholar]
- Guyon, I.; Elisseeff, A. An Introduction to Variable and Feature Selection. J. Mach. Learn. Res. 2003, 3, 1157–1182. [Google Scholar]
- Chen, X.; Wasikowski, M. FAST: A Roc-Based Feature Selection Metric for Small Samples and Imbalanced Data Classification Problems. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, NV, USA, 24–27 August 2008; pp. 124–132. [Google Scholar]
- Stanczyk, U. Feature Evaluation by Filter, Wrapper, and Embedded Approaches. Stud. Comput. Intell. 2015, 584, 29–44. [Google Scholar]
- Brezočnik, L.; Fister, I.; Podgorelec, V. Swarm Intelligence Algorithms for Feature Selection: A Review. Appl. Sci. 2018, 8, 1521. [Google Scholar] [CrossRef] [Green Version]
- Tran, M.Q.; Elsisi, M.; Liu, M.K. Effective feature selection with fuzzy entropy and similarity classifier for chatter vibration diagnosis. Measurement 2021, 184, 109962. [Google Scholar] [CrossRef]
- Lundberg, S.M.; Lee, S.I. A Unified Approach to Interpreting Model Predictions. arXiv 2017, arXiv:1705.07874. [Google Scholar]
- Bussmann, N.; Giudici, P.; Marinelli, D.; Papenbrock, J. Explainable AI in Fintech Risk Management. Front. Artif. Intell. 2020, 3, 26. [Google Scholar] [CrossRef] [PubMed]
- Gramegna, A.; Giudici, P. Why to Buy Insurance? An Explainable Artificial Intelligence Approach. Risks 2020, 8, 137. [Google Scholar] [CrossRef]
- Lin, W.C.; Tsai, C.F.; Hu, Y.H.; Jhang, J.S. Clustering-based undersampling in class-imbalanced data. Inf. Sci. 2017, 409-410, 17–26. [Google Scholar] [CrossRef]
- Gramegna, A.; Giudici, P. SHAP and LIME: An Evaluation of Discriminative Power in Credit Risk. Front. Artif. Intell. 2021, 4, 140. [Google Scholar] [CrossRef] [PubMed]
- Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
- Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.Y. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
- Shapley, L.S. A Value for n-Person Games; Defense Technical Information Center: Fort Belvoir, VA, USA, 1952. [Google Scholar]
- Joseph, A. Shapley Regressions: A Framework for Statistical Inference on Machine Learning Models; King’s Business School: London, UK, 2019; ISSN 2516-593. [Google Scholar]
- Lundberg, S.; Erion, G.; Lee, S.I. Consistent Individualized Feature Attribution for Tree Ensembles. arXiv 2018, arXiv:1802.03888. [Google Scholar]
- James, G.; Witten, D.; Hastie, T.; Tibshirani, R. An Introduction to Statistical Learning: With Applications in R; Springer: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
- Tibshirani, R. Regression Shrinkage and Selection via the Lasso. J. R. Stat. Soc. (Ser. B) 1996, 58, 267–288. [Google Scholar] [CrossRef]
- Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
- Kursa, M.B.; Rudnicki, W.R. Feature Selection with the Boruta Package. J. Stat. Softw. 2010, 36, 1–13. [Google Scholar] [CrossRef] [Green Version]
- Giudici, P.; Hadji-Misheva, B.; Spelta, A. Network based credit risk models. Qual. Eng. 2020, 32, 199–211. [Google Scholar] [CrossRef]
- Giudici, P.; Raffinetti, E. Lorenz model selection. J. Classif. 2020, 32, 754–768. [Google Scholar] [CrossRef]
- Giudici, P.; Raffinetti, E. Shapley-Lorenz Explainable artificial intelligebnce. Expert Syst. Appl. 2021, 167, 114104. [Google Scholar] [CrossRef]
- Baysal, Y.A.; Ketenci, S.; Altas, I.H.; Kayikcioglu, T. Multi-objective symbiotic organism search algorithm for optimal feature selection in brain computer interfaces. Expert Syst. Appl. 2021, 165, 113907. [Google Scholar] [CrossRef]
- Janowski, L.; Tylmann, K.; Trzcinska, K.; Tegowski, J.; Rudowski, S. Exploration of Glacial Landforms by Object-Based Image Analysis and Spectral Parameters of Digital Elevation Model. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–17. [Google Scholar] [CrossRef]
Method | n. of Features | AUC | F1 Score |
---|---|---|---|
LASSO Regular | 7 | 0.8047 | 0.5156 |
LASSO SHAP | 15 | 0.8625 | 0.5571 |
Bi-directional feature selection Regular | 27 | 0.8674 | 0.5496 |
Bi-directional feature selection SHAP | 33 | 0.8689 | 0.5569 |
Boruta Regular | 26 | 0.8699 | 0.5581 |
Boruta SHAP | 45 | 0.8721 | 0.5589 |
Method | n. of Features | AUC | F1 Score |
---|---|---|---|
Full model | 49 | 0.8137 | 0.5167 |
LASSO Regular | 7 | 0.8012 | 0.5088 |
LASSO SHAP | 15 | 0.8466 | 0.5364 |
Bi-directional feature selection Regular | 27 | 0.8294 | 0.5188 |
Bi-directional feature selection SHAP | 33 | 0.8519 | 0.5407 |
Boruta Regular | 26 | 0.8480 | 0.5413 |
Boruta SHAP | 45 | 0.8447 | 0.5430 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Gramegna, A.; Giudici, P. Shapley Feature Selection. FinTech 2022, 1, 72-80. https://doi.org/10.3390/fintech1010006
Gramegna A, Giudici P. Shapley Feature Selection. FinTech. 2022; 1(1):72-80. https://doi.org/10.3390/fintech1010006
Chicago/Turabian StyleGramegna, Alex, and Paolo Giudici. 2022. "Shapley Feature Selection" FinTech 1, no. 1: 72-80. https://doi.org/10.3390/fintech1010006
APA StyleGramegna, A., & Giudici, P. (2022). Shapley Feature Selection. FinTech, 1(1), 72-80. https://doi.org/10.3390/fintech1010006