Level-Wise Feature-Guided Cascading Ensembles for Credit Scoring
Abstract
1. Introduction
- (1)
- We introduce a novel hierarchical feature selection strategy that systematically refines high-dimensional heterogeneous data into a more discriminative and parsimonious feature representation. This structured approach significantly reduces data complexity and enhances the interpretability of key credit risk indicators, laying a robust foundation for subsequent modeling.
- (2)
- Building upon this refined feature subspace, we develop a cascaded gradient boosting tree architecture designed for the deep exploration of complex nonlinear relationships inherent in credit data. This layered structure enables progressive learning and effective information fusion across levels, thereby substantially improving the model’s expressive power and predictive accuracy for credit risk.
- (3)
- The inherent ensemble nature of our proposed cascaded framework significantly bolsters model robustness. By integrating multiple weak learners within the cascaded structure, the LFGCE model effectively mitigates the adverse impacts of data noise and outliers and curtails the risk of overfitting, which is critical for ensuring stable and reliable credit scoring in volatile financial environments.
2. Literature Review
3. Methodology
3.1. Overview of LFGCE
3.2. XGBoost as Base Learner
3.3. Importance-Driven Feature Selection
3.4. Training and Inference Procedure of LFGCE
Algorithm 1 Pseudo-code of level-wise feature-guided cascading ensembles |
Input: Training data with training size D, cascade layers L, number of base learners per layer T, maximum depth d, feature selection ratio r, K for K-fold cross-validation, as the initial cross-validation accuracy score. Output: level-wise feature-guided cascading ensembles 1: Initialize X as the original enhanced feature matrix 2: for l = 1 to L do 3: for t = 1 to T do 4: Split the dataset into K parts for K-fold cross-validation as , is the subset for training, denotes the validation set 5: Train a benchmark tree ensemble model 6: Ensemble base learner as l-th cascade layer 7: Get the prediction vector ← 8: Compute feature importance 9: Compute K-fold cross-validation accuracy score for l-th cascade layer 10: if > do 11: Get importance scores of l-th layer 12: Sort feature index by feature importance scores 13: Perform feature selection with a feature selection ratio 14: Update 15: ← 16: else 17: break 18: return cascade ensemble LFGCE |
Algorithm 2 Inference pseudo-code of LFGCE |
Input: A trained hierarchical feature-guided cascade forest model, comprising L levels. Each level l (1 ≤ l ≤ L) consists of T base learners and a test sample x. Output: Predicted class label . 1: For layer l = 1 to L − 1 2: //get top feature set 3: is a feature selection operation 4: 5: for base learner t = 1 to T 6: 7: concatenate predictive probability 8: 9: 10: 11: for to 12: 13: Predictive probability 14: |
4. Experimental Settings
4.1. Credit Scoring Datasets
4.2. Evaluation Metrics
- (1)
- Accuracy (Acc)
- (2)
- Recall (Rec)
- (3)
- Precision (Pre)
- (4)
- F1-score (F1)
- (5)
- Brier Score (BS)
- (6)
- ROC curve and AUC
4.3. Implementation Details
5. Experimental Results
5.1. Performance Comparison and Analysis
5.2. Significance Test
6. Discussion
7. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Dastile, X.; Celik, T.; Potsane, M. Statistical and Machine Learning Models in Credit Scoring: A Systematic Literature Survey. Appl. Soft Comput. 2020, 91, 106263. [Google Scholar] [CrossRef]
- Eisenbeis, R.A. Problems in Applying Discriminant Analysis in Credit Scoring Models. J. Bank. Financ. 1978, 2, 205–219. [Google Scholar] [CrossRef]
- Sohn, S.Y.; Kim, D.H.; Yoon, J.H. Technology Credit Scoring Model with Fuzzy Logistic Regression. Appl. Soft Comput. 2016, 43, 150–158. [Google Scholar] [CrossRef]
- Runchi, Z.; Liguo, X.; Qin, W. An Ensemble Credit Scoring Model Based on Logistic Regression with Heterogeneous Balancing and Weighting Effects. Expert Syst. Appl. 2023, 212, 118732. [Google Scholar] [CrossRef]
- Ogundimu, E.O. On Lasso and Adaptive Lasso for Non-Random Sample in Credit Scoring. Stat. Model. 2024, 24, 115–138. [Google Scholar] [CrossRef]
- Montevechi, A.A.; de Carvalho Miranda, R.; Medeiros, A.L.; Montevechi, J.A.B. Advancing Credit Risk Modelling with Machine Learning: A Comprehensive Review of the State-of-the-Art. Eng. Appl. Artif. Intell. 2024, 137, 109082. [Google Scholar] [CrossRef]
- Gambacorta, L.; Huang, Y.; Qiu, H.; Wang, J. How Do Machine Learning and Non-Traditional Data Affect Credit Scoring? New Evidence from a Chinese Fintech Firm. J. Financ. Stab. 2024, 73, 101284. [Google Scholar] [CrossRef]
- Liu, Y.; Baals, L.J.; Osterrieder, J.; Hadji-Misheva, B. Leveraging Network Topology for Credit Risk Assessment in P2P Lending: A Comparative Study under the Lens of Machine Learning. Expert Syst. Appl. 2024, 252, 124100. [Google Scholar] [CrossRef]
- Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Liu, W.; Fan, H.; Xia, M. Step-Wise Multi-Grained Augmented Gradient Boosting Decision Trees for Credit Scoring. Eng. Appl. Artif. Intell. 2021, 97, 104036. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
- Mienye, I.D.; Sun, Y. A Deep Learning Ensemble with Data Resampling for Credit Card Fraud Detection. IEEE Access 2023, 11, 30628–30638. [Google Scholar] [CrossRef]
- Yang, M.; Lim, M.K.; Qu, Y.; Li, X.; Ni, D. Deep Neural Networks with L1 and L2 Regularization for High Dimensional Corporate Credit Risk Prediction. Expert Syst. Appl. 2023, 213, 118873. [Google Scholar] [CrossRef]
- Xiao, J.; Zhong, Y.; Jia, Y.; Wang, Y.; Li, R.; Jiang, X.; Wang, S. A Novel Deep Ensemble Model for Imbalanced Credit Scoring in Internet Finance. Int. J. Forecast. 2024, 40, 348–372. [Google Scholar] [CrossRef]
- Myers, J.H.; Forgy, E.W. The Development of Numerical Credit Evaluation Systems. J. Am. Stat. Assoc. 1963, 58, 799–806. [Google Scholar] [CrossRef]
- Orgler, Y.E. A Credit Scoring Model for Commercial Loans. J. Money Credit Bank. 1970, 2, 435–445. [Google Scholar] [CrossRef]
- Fitzpatrick, D.B. An Analysis of Bank Credit Card Profit. J. Bank Res. 1976, 7, 199–205. [Google Scholar]
- Wiginton, J.C. A Note on the Comparison of Logit and Discriminant Models of Consumer Credit Behavior. J. Financ. Quant. Anal. 1980, 15, 757–770. [Google Scholar] [CrossRef]
- Çetin, A.İ.; Büyüklü, A.H. A New Approach to K-Nearest Neighbors Distance Metrics on Sovereign Country Credit Rating. Kuwait J. Sci. 2025, 52, 100324. [Google Scholar] [CrossRef]
- Su, J.-H. Utility-Maximizing Binary Prediction via the Nearest Neighbor Method and Its Application to Credit Scoring. J. Bus. Econ. Stat. 2025, 1–23. [Google Scholar] [CrossRef]
- Sohn, S.Y.; Kim, J.W. Decision Tree-Based Technology Credit Scoring for Start-up Firms: Korean Case. Expert Syst. Appl. 2012, 39, 4007–4012. [Google Scholar] [CrossRef]
- Liu, T.; Yang, L. Financial Risk Early Warning Model for Listed Companies Using BP Neural Network and Rough Set Theory. IEEE Access 2024, 12, 27456–27464. [Google Scholar] [CrossRef]
- Ma, Z.; Hou, W.; Zhang, D. A Credit Risk Assessment Model of Borrowers in P2P Lending Based on BP Neural Network. PLoS ONE 2021, 16, e0255216. [Google Scholar] [CrossRef] [PubMed]
- Benítez-Peña, S.; Blanquero, R.; Carrizosa, E.; Ramírez-Cobo, P. Cost-Sensitive Probabilistic Predictions for Support Vector Machines. Eur. J. Oper. Res. 2024, 314, 268–279. [Google Scholar] [CrossRef]
- Shen, F.; Yang, Z.; Zhao, X.; Lan, D. Reject Inference in Credit Scoring Using a Three-Way Decision and Safe Semi-Supervised Support Vector Machine. Inf. Sci. 2022, 606, 614–627. [Google Scholar] [CrossRef]
- Yang, D.; Xiao, B.; Cao, M.; Shen, H. A New Hybrid Credit Scoring Ensemble Model with Feature Enhancement and Soft Voting Weight Optimization. Expert Syst. Appl. 2024, 238, 122101. [Google Scholar] [CrossRef]
- Lu, Z.; Li, H.; Wu, J. Exploring the Impact of Financial Literacy on Predicting Credit Default among Farmers: An Analysis Using a Hybrid Machine Learning Model. Borsa Istanb. Rev. 2024, 24, 352–362. [Google Scholar] [CrossRef]
- Blanco, A.; Pino-Mejías, R.; Lara, J.; Rayo, S. Credit Scoring Models for the Microfinance Industry Using Neural Networks: Evidence from Peru. Expert Syst. Appl. 2013, 40, 356–364. [Google Scholar] [CrossRef]
- Harris, T. Credit Scoring Using the Clustered Support Vector Machine. Expert Syst. Appl. 2015, 42, 741–750. [Google Scholar] [CrossRef]
- Kang, Y.; Chen, L.; Jia, N.; Wei, W.; Deng, J.; Qian, H. A CWGAN-GP-Based Multi-Task Learning Model for Consumer Credit Scoring. Expert Syst. Appl. 2022, 206, 117650. [Google Scholar] [CrossRef]
- Kearns, M. Learning Boolean Formulae or Finite Automata Is as Hard as Factoring; Technical Report TR-14-88; Harvard University Aikem Computation Laboratory: Cambridge, MA, USA, 1988. [Google Scholar]
- Breiman, L. Bagging Predictors. Mach. Learn. 1996, 24, 123–140. [Google Scholar] [CrossRef]
- Wolpert, D.H. Stacked Generalization. Neural Netw. 1992, 5, 241–259. [Google Scholar] [CrossRef]
- Luo, C. A Comparison Analysis for Credit Scoring Using Bagging Ensembles. Expert Syst. 2022, 39, e12297. [Google Scholar] [CrossRef]
- Plaia, A.; Buscemi, S.; Fürnkranz, J.; Mencía, E.L. Comparing Boosting and Bagging for Decision Trees of Rankings. J. Classif. 2022, 39, 78–99. [Google Scholar] [CrossRef]
- Liu, W.; Fan, H.; Xia, M. Multi-Grained and Multi-Layered Gradient Boosting Decision Tree for Credit Scoring. Appl. Intell. 2022, 52, 5325–5341. [Google Scholar] [CrossRef]
- Rao, C.; Liu, Y.; Goh, M. Credit Risk Assessment Mechanism of Personal Auto Loan Based on PSO-XGBoost Model. Complex Intell. Syst. 2023, 9, 1391–1414. [Google Scholar] [CrossRef]
- Mushava, J.; Murray, M. Flexible Loss Functions for Binary Classification in Gradient-Boosted Decision Trees: An Application to Credit Scoring. Expert Syst. Appl. 2024, 238, 121876. [Google Scholar] [CrossRef]
- Liu, W.; Fan, H.; Xia, M. Tree-Based Heterogeneous Cascade Ensemble Model for Credit Scoring. Int. J. Forecast. 2023, 39, 1593–1614. [Google Scholar] [CrossRef]
- Yin, W.; Kirkulak-Uludag, B.; Zhu, D.; Zhou, Z. Stacking Ensemble Method for Personal Credit Risk Assessment in Peer-to-Peer Lending. Appl. Soft Comput. 2023, 142, 110302. [Google Scholar] [CrossRef]
- Lessmann, S.; Baesens, B.; Seow, H.-V.; Thomas, L.C. Benchmarking State-of-the-Art Classification Algorithms for Credit Scoring: An Update of Research. Eur. J. Oper. Res. 2015, 247, 124–136. [Google Scholar] [CrossRef]
- Demšar, J. Statistical Comparisons of Classifiers over Multiple Data Sets. J. Mach. Learn. Res. 2006, 7, 1–30. [Google Scholar]
Dataset | Samples | Variables | Good/Bad |
---|---|---|---|
Australian | 690 | 14 | 307/383 |
German | 1000 | 24 | 700/300 |
Japanese | 690 | 15 | 296/357 |
Taiwan | 6000 | 23 | 3000/3000 |
Predicted | |||
---|---|---|---|
Bad | Good | ||
Actual | Bad | TP | FP |
Good | FN | TN |
Algorithm | AUC | Acc | Pre | Rec | BS | F1 |
---|---|---|---|---|---|---|
LDA | 0.9269 | 0.8594 | 0.7961 | 0.9196 | 0.1089 | 0.8534 |
LR | 0.9298 | 0.8649 | 0.8309 | 0.8741 | 0.0992 | 0.8520 |
DT | 0.9140 | 0.8437 | 0.8270 | 0.8202 | 0.1112 | 0.8236 |
KNN | 0.9134 | 0.8494 | 0.8640 | 0.7851 | 0.1112 | 0.8227 |
SVM | 0.9262 | 0.8626 | 0.8497 | 0.8395 | 0.1008 | 0.8446 |
NN | 0.9148 | 0.8502 | 0.8328 | 0.8298 | 0.1186 | 0.8313 |
RF | 0.9338 | 0.8645 | 0.8575 | 0.8341 | 0.1032 | 0.8457 |
AdaBoost | 0.9273 | 0.8555 | 0.7913 | 0.9173 | 0.1516 | 0.8496 |
GBDT | 0.9392 | 0.8637 | 0.8426 | 0.8530 | 0.0956 | 0.8478 |
LightGBM | 0.9371 | 0.8624 | 0.8476 | 0.8421 | 0.0964 | 0.8448 |
XGBoost | 0.9394 | 0.8633 | 0.8449 | 0.8487 | 0.0991 | 0.8468 |
Deep Forest | 0.9382 | 0.8725 | 0.8763 | 0.8306 | 0.1036 | 0.8528 |
LFGCE | 0.9411 | 0.8687 | 0.8478 | 0.8591 | 0.0908 | 0.8534 |
Algorithm | AUC | Acc | Pre | Rec | BS | F1 |
---|---|---|---|---|---|---|
LDA | 0.7795 | 0.7585 | 0.7926 | 0.8871 | 0.1653 | 0.8372 |
LR | 0.7808 | 0.7601 | 0.7942 | 0.8872 | 0.1646 | 0.8381 |
DT | 0.7096 | 0.7232 | 0.7791 | 0.8439 | 0.1928 | 0.8102 |
KNN | 0.7383 | 0.7280 | 0.7341 | 0.9586 | 0.1803 | 0.8315 |
SVM | 0.7112 | 0.7065 | 0.7966 | 0.7798 | 0.1859 | 0.7881 |
NN | 0.7799 | 0.7659 | 0.8067 | 0.8753 | 0.1650 | 0.8396 |
RF | 0.7702 | 0.7437 | 0.7545 | 0.9394 | 0.1713 | 0.8369 |
AdaBoost | 0.7035 | 0.7021 | 0.7031 | 0.9941 | 0.1979 | 0.8237 |
GBDT | 0.7792 | 0.7587 | 0.7879 | 0.8967 | 0.1654 | 0.8388 |
LightGBM | 0.7776 | 0.7615 | 0.7887 | 0.9007 | 0.1657 | 0.8410 |
XGBoost | 0.7811 | 0.7582 | 0.7757 | 0.9208 | 0.1652 | 0.8420 |
Deep Forest | 0.7755 | 0.7447 | 0.7526 | 0.9471 | 0.1709 | 0.8387 |
LFGCE | 0.7856 | 0.7643 | 0.7856 | 0.9126 | 0.1621 | 0.8444 |
Algorithm | AUC | Acc | Pre | Rec | BS | F1 |
---|---|---|---|---|---|---|
LDA | 0.9127 | 0.8606 | 0.9402 | 0.7997 | 0.1136 | 0.8643 |
LR | 0.9156 | 0.8549 | 0.9175 | 0.8116 | 0.1030 | 0.8613 |
DT | 0.9134 | 0.8490 | 0.8698 | 0.8561 | 0.1123 | 0.8629 |
KNN | 0.9111 | 0.8487 | 0.8862 | 0.8345 | 0.1108 | 0.8596 |
SVM | 0.8682 | 0.8566 | 0.9311 | 0.8008 | 0.1175 | 0.8611 |
NN | 0.9177 | 0.8487 | 0.8895 | 0.8310 | 0.1048 | 0.8593 |
RF | 0.9319 | 0.8694 | 0.8719 | 0.8964 | 0.1056 | 0.8840 |
AdaBoost | 0.9210 | 0.8548 | 0.9293 | 0.7993 | 0.1509 | 0.8594 |
GBDT | 0.9362 | 0.8642 | 0.8898 | 0.8625 | 0.0960 | 0.8759 |
LightGBM | 0.9349 | 0.8634 | 0.8829 | 0.8696 | 0.0957 | 0.8762 |
XGBoost | 0.9362 | 0.8678 | 0.8959 | 0.8625 | 0.0950 | 0.8789 |
Deep Forest | 0.9324 | 0.8679 | 0.8687 | 0.8982 | 0.1064 | 0.8832 |
LFGCE | 0.9374 | 0.8673 | 0.8954 | 0.8620 | 0.0944 | 0.8784 |
Algorithm | AUC | Acc | Pre | Rec | BS | F1 |
---|---|---|---|---|---|---|
LDA | 0.6985 | 0.6512 | 0.6676 | 0.6023 | 0.2183 | 0.6333 |
LR | 0.6999 | 0.6486 | 0.6612 | 0.6099 | 0.2179 | 0.6345 |
DT | 0.7199 | 0.6666 | 0.6851 | 0.6167 | 0.2152 | 0.6491 |
KNN | 0.7169 | 0.6675 | 0.7163 | 0.5550 | 0.2135 | 0.6254 |
SVM | 0.7058 | 0.6731 | 0.7561 | 0.5114 | 0.2140 | 0.6101 |
NN | 0.7377 | 0.6806 | 0.7131 | 0.6044 | 0.2061 | 0.6543 |
RF | 0.7502 | 0.6949 | 0.7298 | 0.6192 | 0.2011 | 0.6700 |
AdaBoost | 0.7170 | 0.6728 | 0.7599 | 0.5053 | 0.2169 | 0.6070 |
GBDT | 0.7496 | 0.6948 | 0.7297 | 0.6189 | 0.2009 | 0.6698 |
LightGBM | 0.7494 | 0.6954 | 0.7292 | 0.6218 | 0.2010 | 0.6712 |
XGBoost | 0.7504 | 0.6945 | 0.7301 | 0.6175 | 0.2006 | 0.6691 |
Deep Forest | 0.7479 | 0.6904 | 0.7189 | 0.6257 | 0.2026 | 0.6690 |
LFGCE | 0.7508 | 0.6957 | 0.7356 | 0.6114 | 0.2003 | 0.6677 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zou, Y.; Cheng, G. Level-Wise Feature-Guided Cascading Ensembles for Credit Scoring. Symmetry 2025, 17, 914. https://doi.org/10.3390/sym17060914
Zou Y, Cheng G. Level-Wise Feature-Guided Cascading Ensembles for Credit Scoring. Symmetry. 2025; 17(6):914. https://doi.org/10.3390/sym17060914
Chicago/Turabian StyleZou, Yao, and Guanghua Cheng. 2025. "Level-Wise Feature-Guided Cascading Ensembles for Credit Scoring" Symmetry 17, no. 6: 914. https://doi.org/10.3390/sym17060914
APA StyleZou, Y., & Cheng, G. (2025). Level-Wise Feature-Guided Cascading Ensembles for Credit Scoring. Symmetry, 17(6), 914. https://doi.org/10.3390/sym17060914