A Credit Risk Identification Model Based on the Minimax Probability Machine with Generative Adversarial Networks
Abstract
1. Introduction
1.1. Background
1.2. Theoretical Foundations of Credit Scoring
1.3. Behavioral Economics and Traditional Scorecard Systems
1.4. Machine Learning Approaches Under Imbalanced Data
1.5. Model Interpretability and Explainable AI in Credit Scoring
1.6. Motivation and Overview of the Proposed GAN-MPM Framework
2. Related Work
2.1. Generative Adversarial Network
- -
- Current loss point : the loss value corresponding to the current parameter settings of the generator and discriminator in the given training iteration.
- -
- Max : the maximum discriminative capability achievable by D when the generator G is fixed.
- -
- Min : the minimum loss achievable by G when the discriminator D is fixed, representing the generator’s ability to “fool” the discriminator.
- -
- Global optimum: the state in which the distribution of generated samples perfectly matches the real data distribution, such that the discriminator cannot distinguish between the two and always outputs .
2.2. Minimax Probability Machine
2.2.1. Linear Minimax Probability Machine
2.2.2. Nonlinear Minimax Probability Machine
3. A Proposed Model for Credit Risk Identification
3.1. The GAN-MPM Framework
3.2. Algorithm
Algorithm 1 GAN-MPM algorithm |
Input:
Positive samples , negative samples , kernel parameters, generator , discriminator , learning rates , batch size m, number of epochs T, tolerance tol. Output: The label of a new sample. Step 1: Using GAN to generate adversarial samples: Step 1.1: Sample minibatches of real positive samples and noise . Step 1.2: Generate synthetic positive samples . Step 1.3: Update discriminator parameters by maximizing Step 1.3: Update generator parameters by minimizing Step 1.4: Repeat for a fixed number of epochs T or until the discriminator and generator losses satisfy and . Step 2: Construct the augmented dataset by combining real samples with GAN- generated synthetic positive samples . Step 3: Calculate , , , , according to (15) and (16). Step 4: Compute and by solving problem (18). Step 5: Determine the label of a new sample using Formula (19). |
3.3. Model Performance Evaluation Metrics
4. Experiments
4.1. The South German Credit Dataset
4.2. The Results Based on the ACC, F1-Score, Sensitivity, Specificity, and AUC
4.3. Feature Importance Analysis and Interpretability Study of the GAN-MPM Model Based on SHAP
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Bui, T.H.; Truong, T.T.D.; Tran, T.P.T. Financial Ratios as Indicators in Bankruptcy Prediction: A Comparative Analysis of Statistical and Machine Learning Models. Res. Sq. 2025. [Google Scholar] [CrossRef]
- Kozodoi, N.; Jacob, J.; Lessmann, S. Fairness in Credit Scoring: Assessment, Implementation and Profit Implications. Eur. J. Oper. Res. 2022, 297, 1083–1094. [Google Scholar] [CrossRef]
- Long, R. The Market for Lemons and the Regulator’s Signalling Problem. arXiv 2023, arXiv:2312.10896. [Google Scholar]
- Cable, J.; Turner, P. Asymmetric Information and Credit Rationing: Another Economic View Problem of Industrial Bank Lending and Britain’s. In Advances in Monetary Economics; Routledge: London, UK, 2021; pp. 207–220. [Google Scholar]
- DeFusco, A.A.; Tang, H.; Yannelis, C. Measuring the Welfare Cost of Asymmetric Information in Consumer Credit Markets. J. Financ. Econ. 2022, 146, 821–840. [Google Scholar] [CrossRef]
- Ioannidou, V.; Ongena, S.; Peydró, J.L.; van Horen, N. Collateral and Asymmetric Information in Lending Markets. J. Financ. Econ. 2022, 143, 875–902. [Google Scholar] [CrossRef]
- Wang, S.; St John, J. Present Bias, Payday Borrowing, and Financial Literacy. 2023. Available online: https://commons.stmarytx.edu/rsc25pres/14/ (accessed on 8 July 2025).
- Martin, J.; Akhavan-Abdollahian, M.; Taheri, S.; Akman, D. Optimal Credit Scorecard Model Selection Using Costs Arising from Both False Positives and False Negatives. SSRN Electron. J. 2022. [Google Scholar] [CrossRef]
- Huang, E.; Scott, C. Credit Risk Scorecard Design, Validation and User Acceptance—A Lesson for Modellers and Risk Managers; Credit Research Centre, University of Edinburgh Business School: Edinburgh, UK, 2007. [Google Scholar]
- World Bank Group. Credit Scoring Approaches Guidelines; World Bank: Washington, DC, USA, 2020. [Google Scholar]
- FICO. What Is a FICO Score? Fair Isaac Corporation: Minneapolis, MN, USA, 2025. [Google Scholar]
- Gambacorta, L.; Huang, Y.; Qiu, H.; Wang, J. How Do Machine Learning and Non-Traditional Data Affect Credit Scoring? New Evidence from a Chinese Fintech Firm. J. Financ. Stab. 2024, 73, 101284. [Google Scholar] [CrossRef]
- Suhadolnik, N.; Ueyama, J.; Da Silva, S. Machine Learning for Enhanced Credit Risk Assessment: An Empirical Approach. J. Risk Financ. Manag. 2023, 16, 496. [Google Scholar] [CrossRef]
- Mushava, J.; Murray, M. Flexible Loss Functions for Binary Classification in Gradient-Boosted Decision Trees: An Application to Credit Scoring. Expert Syst. Appl. 2024, 238, 121876. [Google Scholar] [CrossRef]
- Wahab, F.; Khan, I.; Sabada, S. Credit Card Default Prediction Using ML and DL Techniques. Internet Things Cyber-Phys. Syst. 2024, 4, 100008. [Google Scholar] [CrossRef]
- Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic Minority Over-sampling Technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
- Chen, Y.; Zhang, H.; Wang, J.; Li, X. Imbalanced Learning: Progress and Challenges with Deep Learning and Ensemble Methods. Artif. Intell. Rev. 2024, 57, 2105–2136. [Google Scholar]
- Dal Pozzolo, A.; Boracchi, G.; Caelen, O.; Alippi, C.; Bontempi, G. Credit Card Fraud Detection: A Realistic Modeling and a Novel Learning Strategy. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 3784–3797. [Google Scholar] [CrossRef] [PubMed]
- Doshi-Velez, F.; Kim, B. Towards a rigorous science of interpretable machine learning. arXiv 2017, arXiv:1702.08608. [Google Scholar] [CrossRef]
- Hjelkrem, L.O.; Lange, P.E. Explaining Deep Learning Models for Credit Scoring with SHAP: A Case Study Using Open Banking Data. J. Risk Financ. Manag. 2023, 16, 221. [Google Scholar] [CrossRef]
- Talaat, F.M.; Aljadani, A.; Badawy, M.; Elhosseini, M. Toward Interpretable Credit Scoring: Integrating Explainable Artificial Intelligence with Deep Learning for Credit Card Default Prediction. Neural Comput. Appl. 2024, 36, 4847–4865. [Google Scholar] [CrossRef]
- Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why should I trust you?” Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 1135–1144. [Google Scholar]
- Aljadani, A.; Alharthi, B.; Farsi, M.A.; Balaha, H.M.; Badawy, M.; Elhosseini, M.A. Mathematical Modeling and Analysis of Credit Scoring Using the LIME Explainer: A Comprehensive Approach. Mathematics 2023, 11, 4055. [Google Scholar] [CrossRef]
- Lundberg, S.M.; Lee, S.I. A unified approach to interpreting model predictions. In Proceedings of the Advances in Neural Information Processing Systems 30 (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Wang, Y.; Chen, X.; Li, Y. Interpretable Machine Learning in Credit Scoring: A Survey. Expert Syst. Appl. 2023, 213, 118849. [Google Scholar]
- Arrieta, A.B.; Díaz-Rodríguez, N.; Del Ser, J.; Bennetot, A.; Tabik, S.; Barbado, A.; Garcia, S.; Gil-Lopez, S.; Molina, D.; Benjamins, R.; et al. Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI. Inf. Fusion 2020, 58, 82–115. [Google Scholar] [CrossRef]
- Fiore, U.; De Santis, A.; Perla, F.; Zanetti, P.; Palmieri, F. Using generative adversarial networks for improving classification effectiveness in credit card fraud detection. Inf. Sci. 2019, 479, 448–455. [Google Scholar] [CrossRef]
- Lanckriet, G.R.G.; El Ghaoui, L.; Bhattacharyya, C.; Jordan, M.I. Minimax Probability Machine. In Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic; MIT Press: Cambridge, MA, USA, 2021; pp. 801–807. [Google Scholar]
- Albert, M.W.; Ingram, O. Multivariate Chebyshev Inequalities. Ann. Math. Stat. 1960, 31, 1001–1014. [Google Scholar] [CrossRef]
Attribute | Field Name | Value Coding |
---|---|---|
A1 | status | Ordinal (4 levels) |
A2 | duration | Real value (months) |
A3 | credit_history | Ordinal (5 levels) |
A4 | purpose | Categorical (multi-class) |
A5 | amount | Real value (DM) |
A6 | savings | Ordinal (5 levels) |
A7 | employment_duration | Ordinal (5 levels) |
A8 | installment_rate | Ordinal (4 levels) |
A9 | personal_status_sex | Categorical (4 levels) |
A10 | other_debtors | Categorical (3 levels) |
A11 | present_residence | Real value (years) |
A12 | property | Ordinal (4 levels) |
A13 | age | Real value (years) |
A14 | other_installment_plans | Categorical (3 levels) |
A15 | housing | Categorical (3 levels) |
A16 | number_credits | Real value (count) |
A17 | job | Ordinal (4 levels) |
A18 | people_liable | Binary (1/2) |
A19 | telephone | Binary (0/1) |
A20 | foreign_worker | Binary (0/1) |
Class | credit_risk | Binary (0/1) |
Model | ACC | F1-Score | Sensitivity | Specificity | AUC |
---|---|---|---|---|---|
Random Forest | 69.93 ± 2.31 | 21.03 ± 7.72 | 20.99 ± 5.67 | 90.89 ± 2.06 | 62.81 ± 3.84 |
XGBoost | 72.47 ± 2.29 | 25.40 ± 5.36 | 20.31 ± 4.13 | 94.77 ± 1.45 | 70.82 ± 1.70 |
SVM | 70.10 ± 1.99 (lin) 71.17 ± 3.42 (rbf) | 29.36 ± 20.11 (lin) 27.56 ± 15.44 (rbf) | 42.08 ± 25.92 (lin) 22.45 ± 13.66 (rbf) | 63.63 ± 30.18 (lin) 74.14 ± 4.07 (rbf) | 52.85 ± 10.94 (lin) 58.30 ± 4.87 (rbf) |
MPM | 71.47 ± 1.74 (lin) 72.13 ± 2.10 (rbf) | 44.06 ± 2.65 (lin) 57.44 ± 3.43 (rbf) | 37.56 ± 3.81 (lin) 62.89± 6.33 (rbf) | 86.00 ± 3.30 (lin) 76.10 ± 3.36 (rbf) | 61.43 ± 0.45 (lin) 69.81 ± 0.46 (rbf) |
GAN-MPM | 72.37 ± 2.18 (lin) 76.13 ± 1.83 (rbf) | 60.93 ± 2.06 (lin) 56.74 ± 3.04 (rbf) | 71.78 ± 4.13 (lin) 52.22 ± 3.95 (rbf) | 72.62 ± 3.92 (lin) 86.38 ± 2.54 (rbf) | 72.03 ± 0.36 (lin) 69.26 ± 0.45 (rbf) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, Y.; Zhao, X.; Huang, H. A Credit Risk Identification Model Based on the Minimax Probability Machine with Generative Adversarial Networks. Mathematics 2025, 13, 3345. https://doi.org/10.3390/math13203345
Zhang Y, Zhao X, Huang H. A Credit Risk Identification Model Based on the Minimax Probability Machine with Generative Adversarial Networks. Mathematics. 2025; 13(20):3345. https://doi.org/10.3390/math13203345
Chicago/Turabian StyleZhang, Yutong, Xiaodong Zhao, and Hailong Huang. 2025. "A Credit Risk Identification Model Based on the Minimax Probability Machine with Generative Adversarial Networks" Mathematics 13, no. 20: 3345. https://doi.org/10.3390/math13203345
APA StyleZhang, Y., Zhao, X., & Huang, H. (2025). A Credit Risk Identification Model Based on the Minimax Probability Machine with Generative Adversarial Networks. Mathematics, 13(20), 3345. https://doi.org/10.3390/math13203345