# The Credit Risk Problem—A Developing Country Case Study

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. State of the Art

#### 2.1. Review of Comparative Studies

#### 2.2. Support Vector Machines (SVM)

#### 2.3. Random Forest (RF)

- Feature selection. This is intended to filter out features that are highly correlated with classification results.
- Decision tree generation.
- Decision tree pruning. The main purpose of pruning is to reduce the risk of overfitting by actively removing some branches.

## 3. Handling Data

#### 3.1. Working Methodology Outline

#### 3.2. The Dataset

#### 3.2.1. Imbalance

#### 3.2.2. Feature Engineering

## 4. Data Training and Testing

#### 4.1. Testing Main ML Methods

#### 4.2. Testing SVM Methods

#### 4.3. Testing Random Forest Based Models

#### 4.3.1. Weighted Random Forest

#### 4.3.2. Balanced Random Forest

#### 4.3.3. Smote and Random Forest

## 5. Going Further—Using Graphic Distribution Analysis

#### 5.1. Graphic Distribution

#### 5.2. Improved Balanced Random Forest

## 6. Concluding Remarks

## Author Contributions

## Funding

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## Note

1 | ROC: Receiver Operating Characteristic. |

## References

- Ampountolas, Apostolos, Titus Nyarko Nde, Paresh Date, and Corina Constantinescu. 2021. A machine learning approach for micro-credit scoring. Risks 9: 50. [Google Scholar] [CrossRef]
- Baesens, Bart, Tony Van Gestel, Stijn Viaene, Maria Stepanova, Johan Suykens, and Jan Vanthienen. 2003. Benchmarking state-of-the-art classification algorithms for credit scoring. Journal of the Operational Research Society 54: 627–35. [Google Scholar] [CrossRef]
- Banasik, John, Jonathan Crook, and Lyn Thomas. 2003. Sample selection bias in credit scoring models. Journal of the Operational Research Society 54: 822–32. [Google Scholar] [CrossRef]
- Batuwita, Rukshan, and Vasile Palade. 2010. Fsvm-cil: Fuzzy support vector machines for class imbalance learning. IEEE Transactions on Fuzzy Systems 18: 558–71. [Google Scholar] [CrossRef]
- Boser, Bernhard E., Isabelle M. Guyon, and Vladimir N. Vapnik. 1992. A training algorithm for optimal margin classifiers. Paper presented at Fifth Annual Workshop on Computational Learning Theory, Pittsburgh, PA, USA, July 27–29; pp. 144–52. [Google Scholar]
- Breiman, Leo. 2001. Random forests. Machine Learning 45: 5–32. [Google Scholar] [CrossRef] [Green Version]
- Brown, Iain, and Christophe Mues. 2012. An experimental comparison of classification algorithms for imbalanced credit scoring data sets. Expert Systems with Applications 39: 3446–53. [Google Scholar] [CrossRef] [Green Version]
- Brygała, Magdalena. 2022. Consumer bankruptcy prediction using balanced and imbalanced data. Risks 10: 24. [Google Scholar] [CrossRef]
- De Melo Junior, Leopoldo Soares, Franco Maria Nardini, Chiara Renso, and José Antônio Fernandes de Macêdo. 2019. An empirical comparison of classification algorithms for imbalanced credit scoring datasets. Paper presented at 18th IEEE International Conference On Machine Learning And Applications (ICMLA), Boca Raton, FL, USA, December 16–19; pp. 747–54. [Google Scholar]
- Dinca, Gheorghita, and Madalina Bociu. 2015. Using discriminant analysis for credit decision. Bulletin of the Transilvania University of Brasov. Economic Sciences. Series V 8: 277. [Google Scholar]
- Galindo, Jorge, and Pablo Tamayo. 2000. Credit risk assessment using statistical and machine learning: Basic methodology and risk modeling applications. Computational Economics 15: 107–43. [Google Scholar] [CrossRef]
- Karlis, Dimitris, and Mohieddine Rahmouni. 2007. Analysis of defaulters’ behaviour using the Poisson-mixture approach. IMA Journal of Management Mathematics 18: 297–311. [Google Scholar] [CrossRef]
- Kil, Krzysztof, Radosław Ciukaj, and Justyna Chrzanowska. 2021. Scoring models and credit risk: The case of cooperative banks in poland. Risks 9: 132. [Google Scholar] [CrossRef]
- Laitinen, Erkki K., and Teija Laitinen. 2000. Bankruptcy prediction: Application of the Taylor’s expansion in logistic regression. International Review of Financial Analysis 9: 327–49. [Google Scholar] [CrossRef]
- Lee, Tian-Shyug, Chih-Chou Chiu, Chi-Jie Lu, and I-Fei Chen. 2002. Credit scoring using the hybrid neural discriminant technique. Expert Systems with Applications 23: 245–54. [Google Scholar] [CrossRef]
- Lessmann, Stefan, Bart Baesens, Hsin-Vonn Seow, and Lyn C. Thomas. 2015. Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research. European Journal of Operational Research 247: 124–36. [Google Scholar] [CrossRef] [Green Version]
- Lin, Chun Fu, and Sheng De Wang. 2002. Fuzzy support vector machines. IEEE Transactions on Neural Networks 13: 464–71. [Google Scholar] [CrossRef]
- Lundberg, Scott M., and Su-In Lee. 2017. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems. Long Beach: The MIT Press, pp. 4765–74. [Google Scholar]
- Mukid, M. A., T. Widiharih, A. Rusgiyono, and A. Prahutama. 2018. Credit scoring analysis using weighted k nearest neighbor. Journal of Physics: Conference Series 1025: 12114. [Google Scholar] [CrossRef]
- Ortuño, M. Artís, Montserrat Guillén, and JOSÉ Ma Martínez. 1994. A model for credit scoring: An application of discriminant analysis. Qüestiió: Quaderns D’estadística i Investigació Operativa 18. [Google Scholar]
- Pang, Su-Lin, Yan-Ming Wang, and Yuan-Huai Bai. 2002. Credit scoring model based on neural network. Paper presented at International Conference on Machine Learning and Cybernetics, Beijing, China, November 4–5; Volume 4, pp. 1742–46. [Google Scholar]
- Schebesch, Klaus B., and Ralf Stecking. 2005. Support vector machines for classifying and describing credit applicants: Detecting typical and critical regions. Journal of the Operational Research Society 56: 1082–88. [Google Scholar] [CrossRef]
- Suykens, Johan A. K., Jos De Brabanter, Lukas Lukas, and Joos Vandewalle. 2002. Weighted least squares support vector machines: Robustness and sparce approximation. Neurocomputing 48: 85–105. [Google Scholar] [CrossRef]
- West, David. 2000. Neural network credit scoring models. Computers & Operations Research 27: 1131–52. [Google Scholar]
- Yang, Weixin, and Lingguang Li. 2018. Efficiency evaluation of industrial waste gas control in china: A study based on data envelopment analysis (dea) model. Journal of Cleaner Production 179: 1–11. [Google Scholar] [CrossRef]
- Yang, Weixin, Hao Gao, and Yunpeng Yang. 2022. Analysis of influencing factors of embodied carbon in china’s export trade in the background of “carbon peak” and “carbon neutrality”. Sustainability 14: 3308. [Google Scholar] [CrossRef]
- Yu, Lean. 2014. Credit risk evaluation with a least squares fuzzy support vector machines classifier. Discrete Dynamics in Nature and Society 2014: 564213. [Google Scholar] [CrossRef]
- Yu, Lean, Shouyang Wang, Kin Keung Lai, and Ligang Zhou. 2008. Bio-Inspired Credit Risk Analysis: Computational Intelligence with Support Vector Machines. Berlin: Springer. [Google Scholar] [CrossRef]

Method | Specificity | Sensitivity | Total Accuracy | AUC |
---|---|---|---|---|

Linear Regression | 0.634 | 0.744 | 0.739 | 0.689 |

Linear Discriminant Analysis | 0.634 | 0.745 | 0.74 | 0.69 |

Quadratic Discriminant Analysis | 0.961 | 0.16 | 0.195 | 0.561 |

K Nearest Neighbor | 0.488 | 0.85 | 0.834 | 0.669 |

Multilayer Perceptron | 0.453 | 0.828 | 0.811 | 0.64 |

Decision Tree | 0.611 | 0.786 | 0.778 | 0.698 |

Random Forest | 0.632 | 0.898 | 0.887 | 0.765 |

Adaboost | 0.456 | 0.918 | 0.898 | 0.687 |

Gaussian Naive Bayes | 0.952 | 0.191 | 0.224 | 0.572 |

SVM | 0.283 | 0.995 | 0.964 | 0.639 |

Linear SVM | 0.661 | 0.766 | 0.762 | 0.714 |

Gradient Boost | 0.384 | 0.959 | 0.934 | 0.672 |

Method | Specificity | Sensitivity | Total Accuracy | AUC |
---|---|---|---|---|

Linear SVM | 0.51 | 0.818 | 0.801 | 0.664 |

Fuzzy SVM | 0.625 | 0.716 | 0.711 | 0.67 |

Bilateral Fuzzy SVM | 0.549 | 0.781 | 0.769 | 0.665 |

LSSVM | 0.392 | 0.859 | 0.833 | 0.625 |

LS-Fuzzy SVM | 0.37 | 0.891 | 0.866 | 0.63 |

Weighted LSSVM | 0.412 | 0.852 | 0.828 | 0.632 |

LS Bilateral Fuzzy SVM | 0.444 | 0.845 | 0.826 | 0.645 |

SVM and bagging | 0.333 | 0.888 | 0.858 | 0.611 |

Fuzzy SVM and bagging | 0.927 | 0.315 | 0.342 | 0.621 |

LS-Fuzzy SVM and bagging | 0.213 | 0.879 | 0.846 | 0.546 |

Method | Specificity | Sensitivity | Total Accuracy | AUC |
---|---|---|---|---|

Random Forest | 0.827 | 0.769 | 0.772 | 0.805 |

Balanced Random Forest | 0.829 | 0.796 | 0.808 | 0.807 |

Weighted Random Forest | 0.79 | 0.82 | 0.819 | 0.805 |

Borderline SMOTE RF | 0.526 | 0.894 | 0.878 | 0.71 |

Method | Specificity | Sensitivity | Total Accuracy | AUC |
---|---|---|---|---|

BRF | 0.827 | 0.769 | 0.772 | 0.805 |

Classifier 1 | 0.775 | 0.845 | 0.841 | 0.802 |

Classifier 2 | 0.908 | 0.701 | 0.708 | 0.799 |

Classifier 3 | 0.887 | 0.721 | 0.728 | 0.804 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Fejza, D.; Nace, D.; Kulla, O.
The Credit Risk Problem—A Developing Country Case Study. *Risks* **2022**, *10*, 146.
https://doi.org/10.3390/risks10080146

**AMA Style**

Fejza D, Nace D, Kulla O.
The Credit Risk Problem—A Developing Country Case Study. *Risks*. 2022; 10(8):146.
https://doi.org/10.3390/risks10080146

**Chicago/Turabian Style**

Fejza, Doris, Dritan Nace, and Orjada Kulla.
2022. "The Credit Risk Problem—A Developing Country Case Study" *Risks* 10, no. 8: 146.
https://doi.org/10.3390/risks10080146