# Shapley Feature Selection

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Methods

#### 2.1. Data

#### 2.2. Models

#### 2.2.1. LightGBM

#### 2.2.2. SHAP

#### 2.3. Feature Selection

#### 2.3.1. Stepwise Feature Selection

#### 2.3.2. LASSO

#### 2.3.3. BORUTA

## 3. Results

## 4. Conclusions and Future Works

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## References

- Subramanian, D.; Greiner, R.; Pearl, J. Land Economics. Relevance
**1997**, 97, 1–2. [Google Scholar] - Guyon, I.; Elisseeff, A. An Introduction to Variable and Feature Selection. J. Mach. Learn. Res.
**2003**, 3, 1157–1182. [Google Scholar] - Chen, X.; Wasikowski, M. FAST: A Roc-Based Feature Selection Metric for Small Samples and Imbalanced Data Classification Problems. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, NV, USA, 24–27 August 2008; pp. 124–132. [Google Scholar]
- Stanczyk, U. Feature Evaluation by Filter, Wrapper, and Embedded Approaches. Stud. Comput. Intell.
**2015**, 584, 29–44. [Google Scholar] - Brezočnik, L.; Fister, I.; Podgorelec, V. Swarm Intelligence Algorithms for Feature Selection: A Review. Appl. Sci.
**2018**, 8, 1521. [Google Scholar] [CrossRef] [Green Version] - Tran, M.Q.; Elsisi, M.; Liu, M.K. Effective feature selection with fuzzy entropy and similarity classifier for chatter vibration diagnosis. Measurement
**2021**, 184, 109962. [Google Scholar] [CrossRef] - Lundberg, S.M.; Lee, S.I. A Unified Approach to Interpreting Model Predictions. arXiv
**2017**, arXiv:1705.07874. [Google Scholar] - Bussmann, N.; Giudici, P.; Marinelli, D.; Papenbrock, J. Explainable AI in Fintech Risk Management. Front. Artif. Intell.
**2020**, 3, 26. [Google Scholar] [CrossRef] [PubMed] - Gramegna, A.; Giudici, P. Why to Buy Insurance? An Explainable Artificial Intelligence Approach. Risks
**2020**, 8, 137. [Google Scholar] [CrossRef] - Lin, W.C.; Tsai, C.F.; Hu, Y.H.; Jhang, J.S. Clustering-based undersampling in class-imbalanced data. Inf. Sci.
**2017**, 409-410, 17–26. [Google Scholar] [CrossRef] - Gramegna, A.; Giudici, P. SHAP and LIME: An Evaluation of Discriminative Power in Credit Risk. Front. Artif. Intell.
**2021**, 4, 140. [Google Scholar] [CrossRef] [PubMed] - Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
- Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.Y. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
- Shapley, L.S. A Value for n-Person Games; Defense Technical Information Center: Fort Belvoir, VA, USA, 1952. [Google Scholar]
- Joseph, A. Shapley Regressions: A Framework for Statistical Inference on Machine Learning Models; King’s Business School: London, UK, 2019; ISSN 2516-593. [Google Scholar]
- Lundberg, S.; Erion, G.; Lee, S.I. Consistent Individualized Feature Attribution for Tree Ensembles. arXiv
**2018**, arXiv:1802.03888. [Google Scholar] - James, G.; Witten, D.; Hastie, T.; Tibshirani, R. An Introduction to Statistical Learning: With Applications in R; Springer: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
- Tibshirani, R. Regression Shrinkage and Selection via the Lasso. J. R. Stat. Soc. (Ser. B)
**1996**, 58, 267–288. [Google Scholar] [CrossRef] - Breiman, L. Random Forests. Mach. Learn.
**2001**, 45, 5–32. [Google Scholar] [CrossRef] [Green Version] - Kursa, M.B.; Rudnicki, W.R. Feature Selection with the Boruta Package. J. Stat. Softw.
**2010**, 36, 1–13. [Google Scholar] [CrossRef] [Green Version] - Giudici, P.; Hadji-Misheva, B.; Spelta, A. Network based credit risk models. Qual. Eng.
**2020**, 32, 199–211. [Google Scholar] [CrossRef] - Giudici, P.; Raffinetti, E. Lorenz model selection. J. Classif.
**2020**, 32, 754–768. [Google Scholar] [CrossRef] - Giudici, P.; Raffinetti, E. Shapley-Lorenz Explainable artificial intelligebnce. Expert Syst. Appl.
**2021**, 167, 114104. [Google Scholar] [CrossRef] - Baysal, Y.A.; Ketenci, S.; Altas, I.H.; Kayikcioglu, T. Multi-objective symbiotic organism search algorithm for optimal feature selection in brain computer interfaces. Expert Syst. Appl.
**2021**, 165, 113907. [Google Scholar] [CrossRef] - Janowski, L.; Tylmann, K.; Trzcinska, K.; Tegowski, J.; Rudowski, S. Exploration of Glacial Landforms by Object-Based Image Analysis and Spectral Parameters of Digital Elevation Model. IEEE Trans. Geosci. Remote Sens.
**2021**, 60, 1–17. [Google Scholar] [CrossRef]

Method | n. of Features | AUC | F1 Score |
---|---|---|---|

LASSO Regular | 7 | 0.8047 | 0.5156 |

LASSO SHAP | 15 | 0.8625 | 0.5571 |

Bi-directional feature selection Regular | 27 | 0.8674 | 0.5496 |

Bi-directional feature selection SHAP | 33 | 0.8689 | 0.5569 |

Boruta Regular | 26 | 0.8699 | 0.5581 |

Boruta SHAP | 45 | 0.8721 | 0.5589 |

Method | n. of Features | AUC | F1 Score |
---|---|---|---|

Full model | 49 | 0.8137 | 0.5167 |

LASSO Regular | 7 | 0.8012 | 0.5088 |

LASSO SHAP | 15 | 0.8466 | 0.5364 |

Bi-directional feature selection Regular | 27 | 0.8294 | 0.5188 |

Bi-directional feature selection SHAP | 33 | 0.8519 | 0.5407 |

Boruta Regular | 26 | 0.8480 | 0.5413 |

Boruta SHAP | 45 | 0.8447 | 0.5430 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Gramegna, A.; Giudici, P.
Shapley Feature Selection. *FinTech* **2022**, *1*, 72-80.
https://doi.org/10.3390/fintech1010006

**AMA Style**

Gramegna A, Giudici P.
Shapley Feature Selection. *FinTech*. 2022; 1(1):72-80.
https://doi.org/10.3390/fintech1010006

**Chicago/Turabian Style**

Gramegna, Alex, and Paolo Giudici.
2022. "Shapley Feature Selection" *FinTech* 1, no. 1: 72-80.
https://doi.org/10.3390/fintech1010006