On Ensemble SSL Algorithms for Credit Scoring Problem
Abstract
:1. Introduction
2. Related Work
3. A Review of Semi-Supervised Self-Labeled Classification Methods
3.1. Self-Labeled Methods
3.2. Ensemble Self-Labeled Methods
Algorithm 1: CST-Voting | |
Input: | L—Set of labeled training instances. |
U—Set of unlabeled training instances. | |
Output: | The labels of instances in the testing set. |
|
Algorithm 2: EnSSL | |
Input: | L—Set of labeled training instances. |
U—Set of unlabeled training instances. | |
ThresLev—Threshold level. | |
Output: | The labels of instances in the testing set. |
|
4. Experimental Methodology
4.1. First Phase of Experiments
- CST-Voting exhibited the best performance in 10, 8 and 8 cases for Australian dataset, Japanese dataset and German dataset, respectively, while EnSSL exhibited the highest accuracy in 6, 8 and 8 cases in the same situations.
- Depending upon the base classifier, CST-Voting is the most effective method using NB or SMO as base learner, while EnSSL reported the highest performance using MLP as base learner.
4.2. Second Phase of Experiments
5. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Mays, E. Handbook of Credit Scoring; Global Professional Publishing: London, UK, 2001. [Google Scholar]
- Altman, E. Bankruptcy, Credit Risk, and High Yield Junk Bonds; Wiley-Blackwell: Hoboken, NJ, USA, 2002. [Google Scholar]
- Kramer, J. Clearly Money Has Something to Do with Life—But What Exactly? Reflections on Recent Credit Crunch Fiction (s). In London Post-2010 in British Literature and Culture; Koninklijke Brill NV: Leiden, Netherlands, 2017; p. 215. [Google Scholar]
- Demyanyk, Y.; Van Hemert, O. Understanding the subprime mortgage crisis. Rev. Financial Stud. 2009, 24, 1848–1880. [Google Scholar] [CrossRef]
- Hand, D.J.; Henley, W.E. Statistical classification methods in consumer credit scoring: A review. J. R. Stat. Soc. Ser. A (Stat. Soc.) 1997, 160, 523–541. [Google Scholar] [CrossRef]
- Venkatraman, S. A Proposed Business Intelligent Framework for Recommender Systems. Informatics 2017, 4, 40. [Google Scholar] [CrossRef]
- Lanza-Cruz, I.; Berlanga, R.; Aramburu, M. Modeling Analytical Streams for Social Business Intelligence. Informatics 2018, 5, 33. [Google Scholar] [CrossRef]
- Stamate, C.; Magoulas, G.; Thomas, M. Transfer learning approach for financial applications. arXiv, 2015; arXiv:1509.02807. [Google Scholar]
- Pavlidis, N.; Tasoulis, D.; Plagianakos, V.; Vrahatis, M. Computational intelligence methods for financial time series modeling. Int. J. Bifurc. Chaos 2006, 16, 2053–2062. [Google Scholar] [CrossRef]
- Pavlidis, N.; Tasoulis, D.; Vrahatis, M. Financial forecasting through unsupervised clustering and evolutionary trained neural networks. In Proceedings of the Congress on Evolutionary Computation, Canberra, ACT, Australia, 8–12 December 2003; Volume 4, pp. 2314–2321. [Google Scholar]
- Pavlidis, N.; Plagianakos, V.; Tasoulis, D.; Vrahatis, M. Financial forecasting through unsupervised clustering and neural networks. Oper. Res. 2006, 6, 103–127. [Google Scholar] [CrossRef] [Green Version]
- Council, N.R. Building a Workforce for the Information Economy; National Academies Press: Washington, DC, USA, 2001. [Google Scholar]
- Wowczko, I. Skills and vacancy analysis with data mining techniques. Informatics 2015, 2, 31–49. [Google Scholar] [CrossRef]
- Dinh, T.; Kwon, Y. An Empirical Study on Importance of Modeling Parameters and Trading Volume-Based Features in Daily Stock Trading Using Neural Networks. Informatics 2018, 5, 36. [Google Scholar] [CrossRef]
- Rokach, L. Pattern Classification Using Ensemble Methods; World Scientific Publishing Company: Singapore, 2010. [Google Scholar]
- Zhu, X.; Goldberg, A. Introduction to semi-supervised learning. Synth. Lect. Artif. Intell. Mach. Learn. 2009, 3, 1–130. [Google Scholar] [CrossRef]
- Guo, T.; Li, G. Improved tri-training with unlabeled data. In Software Engineering and Knowledge Engineering: Theory and Practice; Springer: Berlin/Heidelberg, Germany, 2012; pp. 139–147. [Google Scholar]
- Livieris, I.E.; Tampakas, V.; Kiriakidou, N.; Mikropoulos, T.; Pintelas, P. Forecasting students’ performance using an ensemble SSL algorithm. In Proceedings of the 8th International Conference on Software Development and Technologies for Enhancing Accessibility and Fighting Infoexclusion, Thessaloniki, Greece, 20–22 June 2018; ACM: New York, NY, USA, 2018; pp. 1–5. [Google Scholar]
- Livieris, I.E.; Kanavos, A.; Vonitsanos, G.; Kiriakidou, N.; Vikatos, A.; Giotopoulos, K.; Tampakas, V. Performance evaluation of a SSL algorithm for forecasting the Dow Jones index. In Proceedings of the 9th International Conference on Information, Intelligence, Systems and Applications, Zakynthos, Greece, 23–25 July 2018; pp. 1–8. [Google Scholar]
- Livieris, I.E.; Kanavos, A.; Tampakas, V.; Pintelas, P. An ensemble SSL algorithm for efficient chest X-ray image classification. J. Imaging 2018, 4, 95. [Google Scholar] [CrossRef]
- Livieris, I.E.; Drakopoulou, K.; Tampakas, V.; Mikropoulos, T.; Pintelas, P. Predicting secondary school students’ performance utilizing a semi-supervised learning approach. J. Educ. Comput. Res. 2018. [Google Scholar] [CrossRef]
- Livieris, I.E.; Drakopoulou, K.; Tampakas, V.; Mikropoulos, T.; Pintelas, P. An ensemble-based semi-supervised approach for predicting students’ performance. In Research on e-Learning and ICT in Education; Springer: Berlin, Germany, 2018. [Google Scholar]
- Levatić, J.; Brbić, M.; Perdih, T.; Kocev, D.; Vidulin, V.; Šmuc, T.; Supek, F.; Džeroski, S. Phenotype prediction with semi-supervised learning. In Proceedings of the New Frontiers in Mining Complex Patterns: Sixth Edition of the International Workshop NFMCP 2017 in Conjunction with ECML-PKDD 2017, Skopje, Macedonia, 18–22 September 2017. [Google Scholar]
- Levatić, J.; Dzeroski, S.; Supek, F.; Smuc, T. Semi-supervised learning for quantitative structure-activity modeling. Informatica 2013, 37, 173. [Google Scholar]
- Triguero, I.; García, S.; Herrera, F. Self-labeled techniques for semi-supervised learning: Taxonomy, software and empirical study. Knowl. Inf. Syst. 2015, 42, 245–284. [Google Scholar] [CrossRef]
- Triguero, I.; García, S.; Herrera, F. SEG-SSC: A Framework Based on Synthetic Examples Generation for Self-Labeled Semi-Supervised Classification. IEEE Trans. Cybern. 2014, 45, 622–634. [Google Scholar] [CrossRef] [PubMed]
- Louzada, F.; Ara, A.; Fernandes, G.B. Classification methods applied to credit scoring: Systematic review and overall comparison. Surv. Oper. Res. Manag. Sci. 2016, 21, 117–134. [Google Scholar] [CrossRef] [Green Version]
- Kennedy, K.; Namee, B.M.; Delany, S.J. Using semi-supervised classifiers for credit scoring. J. Oper. Res. Soc. 2013, 64, 513–529. [Google Scholar] [CrossRef] [Green Version]
- Ala’raj, M.; Abbod, M. A new hybrid ensemble credit scoring model based on classifiers consensus system approach. Expert Syst. Appl. 2016, 64, 36–55. [Google Scholar] [CrossRef]
- Abellán, J.; Castellano, J.G. A comparative study on base classifiers in ensemble methods for credit scoring. Expert Syst. Appl. 2017, 73, 1–10. [Google Scholar] [CrossRef]
- Tripathi, D.; Edla, D.R.; Cheruku, R. Hybrid credit scoring model using neighborhood rough set and multi-layer ensemble classification. J. Intell. Fuzzy Syst. 2018, 34, 1543–1549. [Google Scholar] [CrossRef]
- Zhang, H.; He, H.; Zhang, W. Classifier selection and clustering with fuzzy assignment in ensemble model for credit scoring. Neurocomputing 2018, 316, 210–221. [Google Scholar] [CrossRef]
- Levatić, J.; Ceci, M.; Kocev, D.; Džeroski, S. Self-training for multi-target regression with tree ensembles. Knowl.-Based Syst. 2017, 123, 41–60. [Google Scholar] [CrossRef]
- Levatić, J.; Kocev, D.; Ceci, M.; Džeroski, S. Semi-supervised trees for multi-target regression. Inf. Sci. 2018, 450, 109–127. [Google Scholar] [CrossRef]
- Yarowsky, D. Unsupervised word sense disambiguation rivaling supervised methods. In Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics, Cambridge, MA, USA, 26–30 June 1995; pp. 189–196. [Google Scholar]
- Li, M.; Zhou, Z. SETRED: Self-training with editing. In Pacific-Asia Conference on Knowledge Discovery and Data Mining; Springer: Berlin, Germany, 2005; pp. 611–621. [Google Scholar]
- Blum, A.; Mitchell, T. Combining labeled and unlabeled data with co-training. In Proceedings of the 11th Annual Conference on Computational Learning Theory, Madison, WI, USA, 24–26 July 1998; pp. 92–100. [Google Scholar]
- Zhou, Y.; Goldman, S. Democratic co-learning. In Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence (ICTAI), Boca Raton, FL, USA, 15–17 November 2004; pp. 594–602. [Google Scholar]
- Li, M.; Zhou, Z. Improve computer-aided diagnosis with machine learning techniques using undiagnosed samples. IEEE Trans. Syst. Man Cybern.-Part A Syst. Hum. 2007, 37, 1088–1098. [Google Scholar] [CrossRef]
- Zhou, Z.; Li, M. Tri-training: Exploiting unlabeled data using three classifiers. IEEE Trans. Knowl. Data Eng. 2005, 17, 1529–1541. [Google Scholar] [CrossRef]
- Zhou, Z. When Semi-supervised Learning Meets Ensemble Learning. In Frontiers of Electrical and Electronic Engineering in China; Springer: Berlin, Germany, 2011; Volume 6, pp. 6–16. [Google Scholar]
- Dietterich, T. Ensemble methods in machine learning. In Multiple Classifier Systems; Kittler, J., Roli, F., Eds.; Springer: Berlin/Heidelberg, Germany, 2001; Volume 1857, pp. 1–15. [Google Scholar]
- Kostopoulos, G.; Livieris, I.; Kotsiantis, S.; Tampakas, V. CST-Voting—A semi-supervised ensemble method for classification problems. J. Intell. Fuzzy Syst. 2018, 35, 99–109. [Google Scholar] [CrossRef]
- Livieris, I.E. A new ensemble self-labeled semi-supervised algorithm. Informatica 2019, to be appeared. 1–14. [Google Scholar]
- Baumgartner, D.; Serpen, G. Large Experiment and Evaluation Tool for WEKA Classifiers. In Proceedings of the International Conference on Data Mining, Miami, FL, USA, 6–9 December 2009; Volume 16, pp. 340–346. [Google Scholar]
- Hall, M.; Frank, E.; Holmes, G.; Pfahringer, B.; Reutemann, P.; Witten, I. The WEKA data mining software: An update. SIGKDD Explor. Newslett. 2009, 11, 10–18. [Google Scholar] [CrossRef]
- Triguero, I.; Sáez, J.; Luengo, J.; García, S.; Herrera, F. On the characterization of noise filters for self-training semi-supervised in nearest neighbor classification. Neurocomputing 2014, 132, 30–41. [Google Scholar] [CrossRef]
- Bache, K.; Lichman, M. UCI Machine Learning Repository; University of California, Department of Information and Computer Science: Irvine, CA, USA, 2013. [Google Scholar]
- Domingos, P.; Pazzani, M. On the optimality of the simple Bayesian classifier under zero-one loss. Mach. Learn. 1997, 29, 103–130. [Google Scholar] [CrossRef]
- Platt, J. Using sparseness and analytic QP to speed training of support vector machines. In Advances in Neural Information Processing Systems; Kearns, M., Solla, S., Cohn, D., Eds.; MIT Press: Cambridge, MA, USA, 1999; pp. 557–563. [Google Scholar]
- Rumelhart, D.; Hinton, G.; Williams, R. Learning internal representations by error propagation. In Parallel Distributed Processing: Explorations in the Microstructure of Cognition; Rumelhart, D., McClelland, J., Eds.; MIT Press: Cambridge, MA, USA, 1986; pp. 318–362. [Google Scholar]
- Aha, D. Lazy Learning; Kluwer Academic Publishers: Dordrecht, The Netherlands, 1997. [Google Scholar]
- Wu, X.; Kumar, V.; Quinlan, J.; Ghosh, J.; Yang, Q.; Motoda, H.; McLachlan, G.; Ng, A.; Liu, B.; Yu, P.; et al. Top 10 algorithms in data mining. Knowl. Inf. Syst. 2008, 14, 1–37. [Google Scholar] [CrossRef]
- Liu, C.; Yuen, P. A boosted co-training algorithm for human action recognition. IEEE Trans. Circuits Syst. Video Technol. 2011, 21, 1203–1213. [Google Scholar] [CrossRef]
- Tanha, J.; van Someren, M.; Afsarmanesh, H. Semi-supervised selftraining for decision tree classifiers. Int. J. Mach. Learn. Cybern. 2015, 8, 355–370. [Google Scholar] [CrossRef]
- Livieris, I.; Kanavos, A.; Tampakas, V.; Pintelas, P. An auto-adjustable semi-supervised self-training algorithm. Algorithm 2018, 11, 139. [Google Scholar] [CrossRef]
- Hodges, J.; Lehmann, E. Rank methods for combination of independent experiments in analysis of variance. Ann. Math. Stat. 1962, 33, 482–497. [Google Scholar] [CrossRef]
- Finner, H. On a monotonicity problem in step-down multiple test procedures. J. Am. Stat. Assoc. 1993, 88, 920–923. [Google Scholar] [CrossRef]
- Levatić, J.; Ceci, M.; Kocev, D.; Džeroski, S. Semi-supervised classification trees. J. Intell. Inf. Syst. 2017, 49, 461–486. [Google Scholar] [CrossRef]
- Jia, X.; Wang, R.; Liu, J.; Powers, D.M. A semi-supervised online sequential extreme learning machine method. Neurocomputing 2016, 174, 168–178. [Google Scholar] [CrossRef]
- Li, K.; Zhang, J.; Xu, H.; Luo, S.; Li, H. A semi-supervised extreme learning machine method based on co-training. J. Comput. Inf. Syst. 2013, 9, 207–214. [Google Scholar]
SSL algorithm | Parameters |
---|---|
Self-training | . |
. | |
Co-training | . |
. | |
Tri-training | No parameters specified. |
Democratic-Co | Classifiers = kNN, C4.5, NB. |
SETRED | . |
. | |
Co-Forest | . |
. |
Base | Alg. | Ratio = 10% | Ratio = 20% | Ratio = 30% | Ratio = 40% | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Learner | Sen | Spe | F1 | Acc | Sen | Spe | F1 | Acc | Sen | Spe | F1 | Acc | Sen | Spe | F1 | Acc | |
NB | Self | 73.9% | 88.3% | 78.4% | 81.9% | 78.2% | 91.4% | 82.8% | 85.5% | 78.2% | 90.3% | 82.2% | 84.9% | 78.5% | 90.6% | 82.5% | 85.2% |
Co | 78.2% | 83.6% | 78.7% | 81.2% | 77.5% | 92.7% | 83.1% | 85.9% | 78.8% | 91.4% | 83.2% | 85.8% | 79.2% | 91.4% | 83.4% | 85.9% | |
Tri | 61.2% | 91.6% | 71.3% | 78.1% | 76.2% | 86.7% | 79.1% | 82.0% | 79.8% | 86.7% | 81.3% | 83.6% | 80.1% | 87.2% | 81.7% | 84.1% | |
CST | 75.6% | 90.3% | 80.6% | 83.8% | 78.2% | 91.9% | 83.0% | 85.8% | 79.2% | 91.6% | 83.5% | 86.1% | 79.8% | 92.2% | 84.2% | 86.7% | |
EnSSL | 74.6% | 90.6% | 80.1% | 83.5% | 78.5% | 90.6% | 82.5% | 85.2% | 79.8% | 90.9% | 83.5% | 85.9% | 80.5% | 91.4% | 84.2% | 86.5% | |
SMO | Self | 88.9% | 79.1% | 82.7% | 83.5% | 85.7% | 83.3% | 83.0% | 84.3% | 88.9% | 81.7% | 84.0% | 84.9% | 88.9% | 82.0% | 84.1% | 85.1% |
Co | 92.2% | 79.1% | 84.5% | 84.9% | 94.1% | 79.1% | 85.5% | 85.8% | 94.1% | 79.1% | 85.5% | 85.8% | 94.1% | 79.1% | 85.5% | 85.8% | |
Tri | 77.5% | 86.7% | 79.9% | 82.6% | 89.3% | 83.0% | 84.8% | 85.8% | 89.3% | 80.9% | 83.8% | 84.6% | 89.6% | 81.2% | 84.1% | 84.9% | |
CST | 89.9% | 84.9% | 86.1% | 87.1% | 90.6% | 80.4% | 84.2% | 84.9% | 93.8% | 82.0% | 86.7% | 87.2% | 94.1% | 82.2% | 87.0% | 87.5% | |
EnSSL | 88.9% | 83.6% | 84.9% | 85.9% | 89.3% | 83.3% | 85.0% | 85.9% | 88.9% | 84.3% | 85.3% | 86.4% | 90.9% | 84.9% | 86.6% | 87.5% | |
MLP | Self | 82.1% | 87.7% | 83.2% | 85.2% | 80.1% | 88.0% | 82.1% | 84.5% | 82.7% | 86.9% | 83.1% | 85.1% | 82.7% | 87.2% | 83.3% | 85.2% |
Co | 80.8% | 87.7% | 82.4% | 84.6% | 79.8% | 91.4% | 83.8% | 86.2% | 79.5% | 91.1% | 83.4% | 85.9% | 80.5% | 91.4% | 84.2% | 86.5% | |
Tri | 71.3% | 89.0% | 77.1% | 81.2% | 83.1% | 82.2% | 81.0% | 82.6% | 89.3% | 83.0% | 84.8% | 85.8% | 89.6% | 83.6% | 85.3% | 86.2% | |
CST | 82.4% | 88.0% | 83.5% | 85.5% | 82.4% | 88.0% | 83.5% | 85.5% | 85.0% | 87.2% | 84.6% | 86.2% | 87.9% | 88.0% | 86.7% | 88.0% | |
EnSSL | 82.7% | 90.3% | 84.9% | 87.0% | 85.0% | 88.0% | 85.0% | 86.7% | 87.9% | 87.5% | 86.4% | 87.7% | 89.3% | 88.8% | 87.8% | 89.0% | |
kNN | Self | 73.9% | 88.3% | 78.4% | 81.9% | 73.3% | 88.3% | 78.0% | 81.6% | 73.3% | 91.4% | 79.6% | 83.3% | 73.6% | 91.4% | 79.9% | 83.5% |
Co | 78.2% | 83.6% | 78.7% | 81.2% | 77.5% | 84.6% | 78.8% | 81.4% | 78.8% | 87.5% | 81.1% | 83.6% | 79.2% | 86.9% | 81.0% | 83.5% | |
Tri | 61.2% | 91.6% | 71.3% | 78.1% | 67.8% | 91.6% | 76.1% | 81.0% | 74.9% | 89.3% | 79.6% | 82.9% | 75.9% | 90.1% | 80.6% | 83.8% | |
CST | 75.6% | 90.3% | 80.6% | 83.8% | 74.6% | 90.9% | 80.2% | 83.6% | 78.5% | 92.2% | 83.4% | 86.1% | 79.8% | 92.4% | 84.3% | 86.8% | |
EnSSL | 74.6% | 90.9% | 80.2% | 83.6% | 73.9% | 89.8% | 79.2% | 82.8% | 78.5% | 90.9% | 82.7% | 85.4% | 79.2% | 91.4% | 83.4% | 85.9% |
Base | Alg. | Ratio = 10% | Ratio = 20% | Ratio = 30% | Ratio = 40% | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Learner | Sen | Spe | F1 | Acc | Sen | Spe | F1 | Acc | Sen | Spe | F1 | Acc | Sen | Spe | F1 | Acc | |
NB | Self | 75.3% | 88.2% | 79.5% | 82.4% | 79.1% | 90.2% | 82.8% | 85.1% | 79.1% | 90.8% | 83.1% | 85.5% | 79.1% | 91.0% | 83.3% | 85.6% |
Co | 83.1% | 86.8% | 83.5% | 85.1% | 78.7% | 90.8% | 82.9% | 85.3% | 79.7% | 91.6% | 84.0% | 86.2% | 80.1% | 91.6% | 84.2% | 86.4% | |
Tri | 74.3% | 88.2% | 78.9% | 81.9% | 73.6% | 91.6% | 80.1% | 83.5% | 73.0% | 90.5% | 79.1% | 82.5% | 73.6% | 91.6% | 80.1% | 83.5% | |
CST | 79.4% | 90.8% | 83.3% | 85.6% | 78.0% | 91.3% | 82.8% | 85.3% | 79.1% | 92.2% | 83.9% | 86.2% | 79.7% | 92.4% | 84.4% | 86.7% | |
EnSSL | 79.4% | 89.1% | 82.5% | 84.7% | 76.7% | 91.6% | 82.1% | 84.8% | 79.7% | 92.2% | 84.3% | 86.5% | 80.1% | 92.4% | 84.6% | 86.8% | |
SMO | Self | 92.2% | 81.0% | 85.7% | 86.1% | 91.9% | 81.0% | 85.5% | 85.9% | 92.9% | 80.7% | 85.9% | 86.2% | 92.6% | 81.0% | 85.9% | 86.2% |
Co | 93.9% | 79.8% | 86.1% | 86.2% | 93.9% | 79.8% | 86.1% | 86.2% | 93.9% | 80.1% | 86.2% | 86.4% | 93.9% | 80.1% | 86.2% | 86.4% | |
Tri | 86.5% | 86.3% | 85.2% | 86.4% | 79.7% | 86.0% | 81.1% | 83.2% | 74.7% | 84.0% | 77.0% | 79.8% | 76.0% | 85.7% | 78.7% | 81.3% | |
CST | 93.6% | 80.7% | 86.3% | 86.5% | 93.2% | 80.1% | 85.8% | 86.1% | 93.2% | 86.6% | 89.0% | 89.6% | 93.6% | 87.1% | 89.5% | 90.0% | |
EnSSL | 93.2% | 81.2% | 86.4% | 86.7% | 93.2% | 81.2% | 86.4% | 86.7% | 93.2% | 85.4% | 88.5% | 89.0% | 93.6% | 86.0% | 88.9% | 89.4% | |
MLP | Self | 84.1% | 87.4% | 84.4% | 85.9% | 86.1% | 85.7% | 84.7% | 85.9% | 86.1% | 87.7% | 85.7% | 87.0% | 86.5% | 88.0% | 86.1% | 87.3% |
Co | 81.1% | 88.8% | 83.3% | 85.3% | 82.8% | 89.9% | 84.9% | 86.7% | 82.8% | 89.6% | 84.8% | 86.5% | 83.4% | 89.6% | 85.2% | 86.8% | |
Tri | 69.3% | 88.0% | 75.4% | 79.5% | 65.2% | 91.6% | 74.4% | 79.6% | 65.2% | 93.3% | 75.2% | 80.6% | 65.5% | 93.8% | 75.8% | 81.0% | |
CST | 80.7% | 90.2% | 83.9% | 85.9% | 84.1% | 89.9% | 85.7% | 87.3% | 84.1% | 90.2% | 85.9% | 87.4% | 84.5% | 90.5% | 86.2% | 87.7% | |
EnSSL | 81.4% | 90.8% | 84.6% | 86.5% | 85.1% | 90.8% | 86.7% | 88.2% | 85.1% | 92.2% | 87.5% | 89.0% | 85.5% | 92.7% | 88.0% | 89.4% | |
kNN | Self | 75.7% | 88.2% | 79.7% | 82.5% | 76.4% | 88.5% | 80.3% | 83.0% | 75.3% | 89.6% | 80.2% | 83.2% | 75.7% | 88.2% | 79.7% | 82.5% |
Co | 79.4% | 85.4% | 80.6% | 82.7% | 79.1% | 86.6% | 81.0% | 83.2% | 83.1% | 85.4% | 82.8% | 84.4% | 83.4% | 85.7% | 83.2% | 84.7% | |
Tri | 56.1% | 88.2% | 65.9% | 73.7% | 59.8% | 93.0% | 71.1% | 77.9% | 74.3% | 95.2% | 82.6% | 85.8% | 76.0% | 94.4% | 83.2% | 86.1% | |
CST | 76.0% | 90.8% | 81.2% | 84.1% | 76.7% | 90.2% | 81.4% | 84.1% | 79.4% | 92.2% | 84.1% | 86.4% | 79.7% | 92.4% | 84.4% | 86.7% | |
EnSSL | 75.0% | 89.6% | 80.0% | 83.0% | 75.7% | 90.2% | 80.7% | 83.6% | 78.4% | 92.4% | 83.6% | 86.1% | 79.1% | 92.4% | 84.0% | 86.4% |
Base | Alg. | Ratio = 10% | Ratio = 20% | Ratio = 30% | Ratio = 40% | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Learner | Sen | Spe | F1 | Acc | Sen | Spe | F1 | Acc | Sen | Spe | F1 | Acc | Sen | Spe | F1 | Acc | |
NB | Self | 80.7% | 45.3% | 79.1% | 70.1% | 84.6% | 48.7% | 81.9% | 73.8% | 84.6% | 50.3% | 82.2% | 74.3% | 84.7% | 50.0% | 82.2% | 74.3% |
Co | 80.0% | 45.0% | 78.6% | 69.5% | 85.7% | 46.7% | 82.2% | 74.0% | 86.4% | 51.7% | 83.4% | 76.0% | 86.6% | 51.7% | 83.5% | 76.1% | |
Tri | 81.6% | 45.7% | 79.6% | 70.8% | 86.0% | 46.0% | 82.2% | 74.0% | 87.1% | 51.0% | 83.7% | 76.3% | 87.4% | 50.7% | 83.8% | 76.4% | |
CST | 81.7% | 46.0% | 79.8% | 71.0% | 86.4% | 47.7% | 82.8% | 74.8% | 87.9% | 51.7% | 84.2% | 77.0% | 88.0% | 51.7% | 84.3% | 77.1% | |
EnSSL | 81.9% | 45.3% | 79.7% | 70.9% | 86.1% | 46.7% | 82.4% | 74.3% | 87.4% | 52.0% | 84.1% | 76.8% | 87.6% | 51.7% | 84.1% | 76.8% | |
SMO | Self | 84.6% | 44.7% | 81.2% | 72.6% | 86.4% | 45.0% | 82.3% | 74.0% | 87.1% | 46.0% | 82.9% | 74.8% | 87.3% | 46.7% | 83.1% | 75.1% |
Co | 84.3% | 45.0% | 81.1% | 72.5% | 86.0% | 47.3% | 82.5% | 74.4% | 87.0% | 48.3% | 83.2% | 75.4% | 87.1% | 48.3% | 83.3% | 75.5% | |
Tri | 84.4% | 45.7% | 81.3% | 72.8% | 86.7% | 46.7% | 82.8% | 74.7% | 87.4% | 47.3% | 83.3% | 75.4% | 87.6% | 47.7% | 83.4% | 75.6% | |
CST | 86.4% | 46.0% | 82.5% | 74.3% | 87.0% | 47.0% | 83.0% | 75.0% | 87.4% | 48.0% | 83.4% | 75.6% | 88.0% | 49.0% | 83.9% | 76.3% | |
EnSSL | 86.3% | 46.0% | 82.4% | 74.2% | 86.4% | 46.7% | 82.6% | 74.5% | 87.3% | 47.7% | 83.2% | 75.4% | 87.4% | 48.3% | 83.4% | 75.7% | |
MLP | Self | 84.6% | 47.0% | 81.6% | 73.3% | 86.4% | 47.3% | 82.7% | 74.7% | 87.1% | 48.3% | 83.3% | 75.5% | 87.3% | 48.3% | 83.4% | 75.6% |
Co | 85.4% | 43.3% | 81.5% | 72.8% | 86.0% | 44.0% | 81.9% | 73.4% | 87.4% | 44.3% | 82.8% | 74.5% | 87.9% | 44.3% | 83.0% | 74.8% | |
Tri | 87.4% | 45.0% | 82.9% | 74.7% | 86.7% | 44.0% | 82.3% | 73.9% | 87.9% | 45.0% | 83.1% | 75.0% | 88.0% | 46.0% | 83.4% | 75.4% | |
CST | 87.0% | 46.0% | 82.8% | 74.7% | 87.1% | 45.0% | 82.7% | 74.5% | 88.3% | 47.0% | 83.7% | 75.9% | 88.3% | 47.3% | 83.7% | 76.0% | |
EnSSL | 87.4% | 47.0% | 83.1% | 75.2% | 87.6% | 47.0% | 83.3% | 75.4% | 88.1% | 47.3% | 83.7% | 75.9% | 88.6% | 48.3% | 84.1% | 76.5% | |
kNN | Self | 84.6% | 40.0% | 80.4% | 71.2% | 86.4% | 42.3% | 81.9% | 73.2% | 86.1% | 43.3% | 81.9% | 73.3% | 86.4% | 43.7% | 82.1% | 73.6% |
Co | 85.4% | 40.7% | 81.0% | 72.0% | 86.0% | 41.7% | 81.5% | 72.7% | 87.4% | 43.7% | 82.6% | 74.3% | 87.7% | 43.7% | 82.8% | 74.5% | |
Tri | 87.4% | 40.7% | 82.1% | 73.4% | 86.7% | 42.7% | 82.1% | 73.5% | 87.9% | 44.0% | 82.9% | 74.7% | 88.0% | 44.3% | 83.1% | 74.9% | |
CST | 85.0% | 47.7% | 82.0% | 73.8% | 87.0% | 46.7% | 82.9% | 74.9% | 86.4% | 46.0% | 82.5% | 74.3% | 86.7% | 46.3% | 82.7% | 74.6% | |
EnSSL | 87.4% | 48.3% | 83.4% | 75.7% | 87.3% | 47.0% | 83.1% | 75.2% | 88.1% | 46.7% | 83.5% | 75.7% | 88.1% | 47.0% | 83.6% | 75.8% |
Algorithm | FAR | Finner Post-Hoc Test | |
---|---|---|---|
-Value | Null Hypothesis | ||
CST-Voting | 12.7917 | - | - |
EnSSL | 19.8333 | 0.323326 | accepted |
Co-training | 29.0833 | 0.029637 | rejected |
Self-training | 42.3333 | 0.000068 | rejected |
Tri-training | 48.4583 | 0.000002 | rejected |
(a) using NB as base learner | |||
Algorithm | FAR | Finner Post-Hoc Test | |
-Value | Null Hypothesis | ||
CST-Voting | 14.375 | - | - |
EnSSL | 15.9583 | 0.824256 | accepted |
Co-training | 32.75 | 0.013257 | rejected |
Tri-training | 44.1667 | 0.000060 | rejected |
Self-training | 45.25 | 0.000060 | rejected |
(b) using SMO as base learner | |||
Algorithm | FAR | Finner Post-Hoc Test | |
-Value | Null Hypothesis | ||
EnSSL | 9.9167 | - | - |
CST-Voting | 22.2917 | 0.082620 | accepted |
Self-training | 34.7083 | 0.000675 | rejected |
Co-training | 36.6667 | 0.000351 | rejected |
Tri-training | 48.9167 | 0.000000 | rejected |
(c) using MLP as base learner | |||
Algorithm | FAR | Finner Post-Hoc Test | |
-Value | Null Hypothesis | ||
CST-Voting | 14.5417 | - | - |
EnSSL | 14.7083 | 0.98135 | accepted |
Co-training | 38.7083 | 0.000933 | rejected |
Tri-training | 41.8333 | 0.000312 | rejected |
Self-training | 42.7083 | 0.000312 | rejected |
(d) using kNN as base learner |
Algorithm | Ratio = 10% | Ratio = 20% | Ratio = 30% | Ratio = 40% | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Sen | Spe | F1 | Acc | Sen | Spe | F1 | Acc | Sen | Spe | F1 | Acc | Sen | Spe | F1 | Acc | |
SETRED | 87.9% | 78.3% | 81.8% | 82.6% | 87.6% | 82.2% | 83.5% | 84.6% | 91.2% | 82.8% | 85.8% | 86.5% | 91.5% | 82.8% | 85.9% | 86.7% |
Co-Forest | 81.4% | 87.5% | 82.6% | 84.8% | 80.5% | 89.0% | 82.9% | 85.2% | 81.4% | 91.4% | 84.7% | 87.0% | 81.8% | 91.4% | 84.9% | 87.1% |
Demo-Co | 82.7% | 82.0% | 80.6% | 82.3% | 83.1% | 85.4% | 82.5% | 84.3% | 84.0% | 86.9% | 83.9% | 85.7% | 84.0% | 87.2% | 84.0% | 85.8% |
CST | 89.9% | 84.9% | 86.1% | 87.1% | 90.6% | 80.4% | 84.2% | 84.9% | 93.8% | 82.0% | 86.7% | 87.2% | 94.1% | 82.2% | 87.0% | 87.5% |
EnSSL | 82.7% | 90.3% | 84.9% | 87.0% | 85.0% | 88.0% | 85.0% | 86.7% | 87.9% | 87.5% | 86.4% | 87.7% | 89.3% | 88.8% | 87.8% | 89.0% |
Algorithm | Ratio = 10% | Ratio = 20% | Ratio = 30% | Ratio = 40% | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Sen | Spe | F1 | Acc | Sen | Spe | F1 | Acc | Sen | Spe | F1 | Acc | Sen | Spe | F1 | Acc | |
SETRED | 91.2% | 81.2% | 85.3% | 85.8% | 92.2% | 81.2% | 85.8% | 86.2% | 92.9% | 81.5% | 86.3% | 86.7% | 92.9% | 81.8% | 86.5% | 86.8% |
Co-Forest | 84.5% | 88.5% | 85.2% | 86.7% | 85.1% | 89.4% | 86.0% | 87.4% | 85.1% | 89.9% | 86.3% | 87.7% | 85.1% | 90.5% | 86.6% | 88.1% |
Demo-Co | 85.5% | 84.6% | 83.8% | 85.0% | 84.5% | 85.7% | 83.8% | 85.1% | 84.8% | 85.7% | 83.9% | 85.3% | 86.1% | 86.0% | 84.9% | 86.1% |
CST | 93.6% | 80.7% | 86.3% | 86.5% | 93.2% | 80.1% | 85.8% | 86.1% | 93.2% | 86.6% | 89.0% | 89.6% | 93.6% | 87.1% | 89.5% | 90.0% |
EnSSL | 81.4% | 90.8% | 84.6% | 86.5% | 85.1% | 90.8% | 86.7% | 88.2% | 85.1% | 92.2% | 87.5% | 89.0% | 85.5% | 92.7% | 88.0% | 89.4% |
Algorithm | Ratio = 10% | Ratio = 20% | Ratio = 30% | Ratio = 40% | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Sen | Spe | F1 | Acc | Sen | Spe | F1 | Acc | Sen | Spe | F1 | Acc | Sen | Spe | F1 | Acc | |
SETRED | 84.3% | 44.7% | 81.0% | 72.4% | 86.7% | 45.0% | 82.5% | 74.2% | 87.4% | 46.7% | 83.2% | 75.2% | 87.6% | 47.0% | 83.3% | 75.4% |
Co-Forest | 85.7% | 45.0% | 81.9% | 73.5% | 87.1% | 45.0% | 82.7% | 74.5% | 87.3% | 46.7% | 83.1% | 75.1% | 87.4% | 47.3% | 83.3% | 75.4% |
Demo-Co | 83.6% | 43.7% | 80.5% | 71.6% | 86.0% | 45.3% | 82.1% | 73.8% | 87.0% | 48.0% | 83.1% | 75.3% | 87.1% | 48.3% | 83.3% | 75.5% |
CST | 86.4% | 46.0% | 82.5% | 74.3% | 87.0% | 47.0% | 83.0% | 75.0% | 87.4% | 48.0% | 83.4% | 75.6% | 88.0% | 49.0% | 83.9% | 76.3% |
EnSSL | 87.3% | 47.0% | 83.1% | 75.2% | 87.6% | 47.0% | 83.3% | 75.4% | 88.1% | 47.3% | 83.7% | 75.9% | 88.6% | 48.3% | 84.1% | 76.5% |
Algorithm | FAR | Finner Post-Hoc Test | |
---|---|---|---|
-Value | Null Hypothesis | ||
EnSSL | 10.375 | - | - |
CST-Voting | 18.7917 | 0.237802 | accepted |
Co-Forest | 28.2083 | 0.016466 | rejected |
SETRED | 44.2917 | 0.000004 | rejected |
Democratic-Co | 50.8333 | 0.000000 | rejected |
© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Livieris, I.E.; Kiriakidou, N.; Kanavos, A.; Tampakas, V.; Pintelas, P. On Ensemble SSL Algorithms for Credit Scoring Problem. Informatics 2018, 5, 40. https://doi.org/10.3390/informatics5040040
Livieris IE, Kiriakidou N, Kanavos A, Tampakas V, Pintelas P. On Ensemble SSL Algorithms for Credit Scoring Problem. Informatics. 2018; 5(4):40. https://doi.org/10.3390/informatics5040040
Chicago/Turabian StyleLivieris, Ioannis E., Niki Kiriakidou, Andreas Kanavos, Vassilis Tampakas, and Panagiotis Pintelas. 2018. "On Ensemble SSL Algorithms for Credit Scoring Problem" Informatics 5, no. 4: 40. https://doi.org/10.3390/informatics5040040
APA StyleLivieris, I. E., Kiriakidou, N., Kanavos, A., Tampakas, V., & Pintelas, P. (2018). On Ensemble SSL Algorithms for Credit Scoring Problem. Informatics, 5(4), 40. https://doi.org/10.3390/informatics5040040