One-Dimensional Convolutional Neural Networks with Feature Selection for Highly Concise Rule Extraction from Credit Scoring Datasets with Heterogeneous Attributes
Abstract
1. Introduction
1.1. Background
1.2. Types of Data Attributes
1.3. Heterogeneous Credit Scoring
1.4. Accuracy–Interpretability Dilemma
1.5. Rule Extraction and the “Black Box” Problem
1.6. Recursive-Rule Extraction (Re-RX) and Related Algorithms
2. Motivation for This Work
Motivation for Research
3. Methods
3.1. Re-RX Algorithm with J48graft
3.2. Deep Convolutional Neural Networks (DCNNs) and Inception Modules
3.3. One-Dimensional Fully-Connected Layer First CNN (1D FCLF-CNN)
4. Theory
4.1. Highly Concise Rule Extraction Using a 1D FCLF-CNN with Re-RX with J48graft
4.2. Rationale Behind the Architecture for the 1D FCLF-CNN with Re-RX with J48graft
4.3. Attribute Selection by Dimension Reduction Using a 1D FCLF-CNN for Datasets with Heterogeneous Attributes
5. Experimental Procedure
6. Results
7. Discussion
7.1. Discussion
7.2. Common Issues with Re-RX with J48graft and ERENN-MHL
8. Conclusions
Author Contributions
Funding
Conflicts of Interest
Notations
1/# | Reciprocal number |
TS ACC | Test accuracy |
# rules | Number of rules |
10 × 10CV | 10 runs of 10-fold cross-validation |
Abbreviations
CNN | Convolutional neural network |
1D | One-dimensional |
FCLF | Fully-connected layer first |
Re-RX | Recursive-rule eXtraction |
SVM | Support vector machine |
DL | Deep learning |
NN | Neural network |
DNN | Deep neural network |
AI | Artificial intelligence |
DCNN | Deep convolutional neural network |
BPNN | Backpropagation neural network |
CV | Cross-validation |
AUC-ROC | Area under the receiver operating characteristic curve |
SMOTE | Synthetic minority over-sampling technique |
DM | Deutsche Mark |
SOM | Self-organization map |
PSO | Particle swarm optimization |
ERENN-MHL | Electric rule extraction from a neural network with a multi-hidden layer for a DNN |
References
- Pławiak, P.; Abdar, M.; Pławiak, J.; Makarenkov, V.; Acharya, U.R. DGHNL: A new deep genetic hierarchical network of learners for prediction of credit scoring. Inf. Sci. 2020, 516, 401–418. [Google Scholar] [CrossRef]
- Liberati, C.; Camillo, F.; Saporta, G. Advances in credit scoring: Combining performance and interpretation in kernel discriminant analysis. Adv. Data Anal. Classif. 2015, 11, 121–138. [Google Scholar] [CrossRef]
- Zhang, Y.; Cheung, Y.-M.; Tan, K.C. A Unified Entropy-Based Distance Metric for Ordinal-and-Nominal-Attribute Data Clustering. IEEE Trans. Neural Netw. Learn. Syst. 2020, 31, 39–52. [Google Scholar] [CrossRef] [PubMed]
- Martens, D.; Baesens, B.; Van Gestel, T.; Vanthienen, J. Comprehensible credit scoring models using rule extraction from support vector machines. Eur. J. Oper. Res. 2007, 183, 1466–1476. [Google Scholar] [CrossRef]
- Abellán, J.; Mantas, C.J. Improving experimental studies about ensembles of classifiers for bankruptcy prediction and credit scoring. Expert Syst. Appl. 2014, 41, 3825–3830. [Google Scholar] [CrossRef]
- Abellán, J.; Castellano, J.G. A comparative study on base classifiers in ensemble methods for credit scoring. Expert Syst. Appl. 2017, 73, 1–10. [Google Scholar] [CrossRef]
- Tripathi, D.; Edla, D.R.; Cheruku, R. Hybrid credit scoring model using neighborhood rough set and multi-layer ensemble classification. J. Intell. Fuzzy Syst. 2018, 34, 1543–1549. [Google Scholar] [CrossRef]
- Kuppili, V.; Tripathi, D.; Edla, D.R. Credit score classification using spiking extreme learning machine. Comput. Intell. 2020, 36, 402–426. [Google Scholar] [CrossRef]
- Pławiak, P.; Abdar, M.; Acharya, U.R. Application of new deep genetic cascade ensemble of SVM classifiers to predict the Australian credit scoring. Appl. Soft Comput. 2019, 84, 105740. [Google Scholar] [CrossRef]
- Tripathi, D.; Edla, D.R.; Cheruku, R.; Kuppili, V. A novel hybrid credit scoring model based on ensemble feature selection and multilayer ensemble classification. Comput. Intell. 2019, 35, 371–394. [Google Scholar] [CrossRef]
- Sun, J.; Li, H.; Huang, Q.-H.; He, K.-Y. Predicting financial distress and corporate failure: A review from the state-of-the-art definitions, modeling, sampling, and featuring approaches. Knowl. Based Syst. 2014, 57, 41–56. [Google Scholar] [CrossRef]
- Chen, T.-C.; Cheng, C.-H. Hybrid models based on rough set classifiers for setting credit rating decision rules in the global banking industry. Knowl. Based Syst. 2013, 39, 224–239. [Google Scholar] [CrossRef]
- Florez-Lopez, R.; Ramon-Jeronimo, J.M. Enhancing accuracy and interpretability of ensemble strategies in credit risk assessment. A correlated-adjusted decision forest proposal. Expert Syst. Appl. 2015, 42, 5737–5753. [Google Scholar] [CrossRef]
- Mues, C.; Baesens, B.; Files, C.M.; Vanthienen, J. Decision diagrams in machine learning: An empirical study on real-life credit-risk data. Expert Syst. Appl. 2004, 27, 257–264. [Google Scholar] [CrossRef]
- Hsieh, N.-C.; Hung, L.-P. A data driven ensemble classifier for credit scoring analysis. Expert Syst. Appl. 2010, 37, 534–545. [Google Scholar] [CrossRef]
- Gallant, S.I. Connectionist expert systems. Commun. ACM 1988, 31, 152–169. [Google Scholar] [CrossRef]
- Saito, K.; Nakano, R. Medical Diagnosis Expert Systems Based on PDP Model. In Proceedings of the IEEE Interenational Conference Neural Network, San Diego, CA, USA, 1988, 24–27 July 1988; pp. I.255–I.262. [Google Scholar]
- Hayashi, Y.; Oishi, T. High Accuracy-priority Rule Extraction for Reconciling Accuracy and Interpretability in Credit Scoring. New Gener. Comput. 2018, 36, 393–418. [Google Scholar] [CrossRef]
- Andrews, R.; Diederich, J.; Tickle, A.B. Survey and critique of techniques for extracting rules from trained artificial neural networks. Knowl. Based Syst. 1995, 8, 373–389. [Google Scholar] [CrossRef]
- Mitra, S.; Hayashi, Y. Neuro-fuzzy rule generation: Survey in soft computing framework. IEEE Trans. Neural Netw. 2000, 11, 748–768. [Google Scholar] [CrossRef]
- Bologna, G. A study on rule extraction from several combined neural networks. Int. J. Neural Syst. 2001, 11, 247–255. [Google Scholar] [CrossRef]
- Setiono, R.; Baesens, B.; Mues, C. Recursive Neural Network Rule Extraction for Data with Mixed Attributes. IEEE Trans. Neural Netw. 2008, 19, 299–307. [Google Scholar] [CrossRef] [PubMed]
- Tran, S.N.; Garcez, A.S.D. Deep Logic Networks: Inserting and Extracting Knowledge from Deep Belief Networks. IEEE Trans. Neural Netw. Learn. Syst. 2016, 29, 246–258. [Google Scholar] [CrossRef]
- De Fortuny, E.J.; Martens, D. Active Learning-Based Pedagogical Rule Extraction. IEEE Trans. Neural Netw. Learn. Syst. 2015, 26, 2664–2677. [Google Scholar] [CrossRef] [PubMed]
- Hayashi, Y. The Right Direction Needed to Develop White-Box Deep Learning in Radiology, Pathology, and Ophthalmology: A Short Review. Front. Robot. AI 2019, 6, 1–8. [Google Scholar] [CrossRef]
- Hayashi, Y. New unified insights on deep learning in radiological and pathological images: Beyond quantitative performances to qualitative interpretation. Inform. Med. Unlocked 2020, 19, 100329. [Google Scholar] [CrossRef]
- Setiono, R. A Penalty-Function Approach for Pruning Feedforward Neural Networks. Neural Comput. 1997, 9, 185–204. [Google Scholar] [CrossRef]
- Quinlan, J.R. Programs for Machine Learning; Morgan Kaufman: San Mateo, CA, USA, 1993. [Google Scholar]
- Hayashi, Y.; Nakano, S. Use of a Recursive-Rule Extraction algorithm with J48graft to archive highly accurate and concise rule extraction from a large breast cancer dataset. Inform. Med. Unlocked 2015, 1, 9–16. [Google Scholar] [CrossRef]
- Hayashi, Y.; Nakano, S.; Fujisawa, S. Use of the recursive-rule extraction algorithm with continuous attributes to improve diagnostic accuracy in thyroid disease. Inform. Med. Unlocked 2015, 1, 1–8. [Google Scholar] [CrossRef]
- Hayashi, Y. Synergy effects between grafting and subdivision in Re-RX with J48graft for the diagnosis of thyroid disease. Knowl. Based Syst. 2017, 131, 170–182. [Google Scholar] [CrossRef]
- Witten, I.H.; Frank, E.; Hall, M.A. Data Mining: Practical Machine Learning Tools and Techniques; Elsevier BV: Amsterdam, The Netherlands, 2011. [Google Scholar]
- Webb, G.I. Decision Tree Grafting from the All-Tests-But-One Partition. In Proceedings of the 16th International Joint Conference on Artificial Intelligence, San Mateo, CA, USA, 10–16 July 1999; pp. 702–707. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. lmageNet classification with deep convolutional neural networks. In Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, NV, USA, 3–6 December 2012; pp. 1097–1105. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going Deeper with Convolutions. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
- Kim, J.-Y.; Cho, S.-B. Exploiting deep convolutional neural networks for a neural-based learning classifier system. Neurocomputing 2019, 354, 61–70. [Google Scholar] [CrossRef]
- Liu, K.; Kang, G.; Zhang, N.; Hou, B. Breast Cancer Classification Based on Fully-Connected Layer First Convolutional Neural Networks. IEEE Access 2018, 6, 23722–23732. [Google Scholar] [CrossRef]
- Chen, C.; Jiang, F.; Yang, C.; Rho, S.; Shen, W.; Liu, S.; Liu, Z. Hyperspectral classification based on spectral–spatial convolutional neural networks. Eng. Appl. Artif. Intell. 2018, 68, 165–171. [Google Scholar] [CrossRef]
- Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar]
- Keras. Available online: https://github.com/keras-team/keras (accessed on 30 June 2020).
- Craven, J.M.; Shavlik, J. Extracting tree-structured representations of trained networks. Adv. Neural Inf. Process. Syst. 1996, 8, 24–30. [Google Scholar]
- Chakraborty, M.; Biswas, S.K.; Purkayastha, B. Rule extraction from neural network trained using deep belief network and back propagation. Knowl. Inf. Syst. 2020, 62, 3753–3781. [Google Scholar] [CrossRef]
- Marqués, A.I.; García, V.; Sánchez, J.S. On the suitability of resampling techniques for the class imbalance problem in credit scoring. J. Oper. Res. Soc. 2013, 64, 1060–1070. [Google Scholar] [CrossRef]
- Salzberg, S.L. On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach. Data Min. Knowl. Discov. 1997, 1, 317–328. [Google Scholar] [CrossRef]
- Bergstra, J.; Bardent, R.; Bengio, Y.; Kégl, B. Algorithms for Hyper-Parameter Optimization. Adv. Neural Inf. Process. Syst. 2011, 24, 2546–2554. [Google Scholar]
- Hsu, F.-J.; Chen, M.-Y.; Chen, Y.-C. The human-like intelligence with bio-inspired computing approach for credit ratings prediction. Neurocomputing 2018, 279, 11–18. [Google Scholar] [CrossRef]
- Jadhav, S.; He, H.; Jenkins, K. Information gain directed genetic algorithm wrapper feature selection for credit rating. Appl. Soft Comput. 2018, 69, 541–553. [Google Scholar] [CrossRef]
- Shen, F.; Zhao, X.; Li, Z.; Li, K.; Meng, Z. A novel ensemble classification model based on neural networks and a classifier optimisation technique for imbalanced credit risk evaluation. Phys. A Stat. Mech. Appl. 2019, 526, 121073. [Google Scholar] [CrossRef]
- Bequé, A.; Lessmann, S.; Bequ, A. Extreme learning machines for credit scoring: An empirical evaluation. Expert Syst. Appl. 2017, 86, 42–53. [Google Scholar] [CrossRef]
- Bologna, G.; Hayashi, Y. A Comparison Study on Rule Extraction from Neural Network Ensembles, Boosted Shallow Trees, and SVMs. Appl. Comput. Intell. Soft Comput. 2018, 2018, 1–20. [Google Scholar] [CrossRef]
- Tai, L.Q.; Vietnam, V.B.A.O.; Huyen, G.T.T. Deep Learning Techniques for Credit Scoring. J. Econ. Bus. Manag. 2019, 7, 93–96. [Google Scholar] [CrossRef]
- Huysmans, J.; Setiono, R.; Baesens, B.; Vanthienen, J. Minerva: Sequential Covering for Rule Extraction. IEEE Trans. Syst. Man Cybern. Part. B (Cybernetics) 2008, 38, 299–309. [Google Scholar] [CrossRef] [PubMed]
- Santana, P.J.; Monte, A.V.; Rucci, E.; Lanzarini, L.; Bariviera, A.F. Analysis of Methods for Generating Classification Rules Applicable to Credit Risk. J. Comput. Sci. Technol. 2017, 17, 20–28. [Google Scholar]
- Kohonen, T.; Somervuo, P. Self-organizing maps of symbol strings. Neurocomputing 1998, 21, 19–30. [Google Scholar] [CrossRef]
- Poli, R.; Kennedy, J.; Blackwell, T. Particle swarm optimization. Swarm Intell. 2007, 1, 33–57. [Google Scholar] [CrossRef]
Methods | Data Attributes (Types) | Ref. |
---|---|---|
Rule extraction for black box | Numerical/categorical | [4,5,6,7,8] |
Deep Learning (DL)-inspired rule extractionfor new black box | Images (pixels) | [24,25,26] |
Numerical/categorical | [1,9,10] | |
Numerical/ordinal/nominal | The proposed method |
Dataset | German | Australian |
---|---|---|
Pruning stop rate for 1D FCLF-CNN | 0.13 | 0.20 |
# First layer in hidden units for 1D FCLF-CNN | 4 | 3 |
Learning rate for 1D FCLF-CNN | 0.0182 | 0.0106 |
Momentum factor for 1D FCLF-CNN | 0.1154 | 0.7549 |
# Filters in each branch in the inception module | 19 | 8 |
# Channels after concatenation | 57 | 24 |
δ1 in Re-RX with J48graft | 0.08 | 0.39 |
δ2 in Re-RX with J48graft | 0.12 | 0.10 |
Dataset | German | Australian |
---|---|---|
# Instances | 1000 | 690 |
# Total features | 20 | 14 |
Categorical nominal | 10 | 0 |
Categorical ordinal | 3 | 8 |
Continuous | 7 | 6 |
Methods | TS ACC (%) | AUC-ROC (%) |
---|---|---|
Neighborhood rough set + multilayer ensemble classification (10CV) (Tripathi et al., 2018) [7] | 86.57 | ---- |
SVM + metaheuristics (10CV) (Hsu et al., 2018) [47] | 84.00 | ---- |
Information gain directed feature selection algorithm (10CV) (Jadhav et al., 2018) [48] | 82.80 | 0.753 |
Ensemble method based on the SMOTE (10CV) (Shen et al., 2019) [49] | 78.70 | 0.810 |
Extreme Learning Machine (10CV) (Bequé and Lessmann, 2017) [50] | 76.40 | 0.801 |
Method | Average # TS ACC (%) | Average # Rules | Average #AUC-ROC |
---|---|---|---|
Continuous Re-RX with J48graft (10 × 10CV) (Hayashi and Oisi, 2018) [18] | 79.0 | 44.9 | 0.72 |
Correlated-adjusted decision forest (mixed) (10CV) (Florez-Lopez and Ramon-Jeronimo, 2015) [13] | 77.4 | 49.0 | 0.79 |
Boosted shallow tree-G1 (10 × 10CV) (Bologna and Hayashi, 2017) [51] | 77.1 | 102.7 | ---- |
Continuous Re-RX (10 × 10CV) (Hayashi et al., 2015) [30] | 75.22 | 39.6 | 0.692 |
Electric rule extraction from a neural network with a multihidden layer for a DNN trained by a DBN (5CV) [43] | 74.50 | 8.0 | ----- |
1D FCLF-CNN with Re-RX with J48graft (10CV) (present paper) | 73.10 | 6.2 | 0.622 |
Re-RX with J48graft (10 × 10CV) (Hayashi et al., 2018) [18] | 72.80 | 16.65 | 0.650 |
Method | TS ACC (%) | AUC-ROC (%) |
---|---|---|
Deep genetic cascade ensemble of SVM classifier (10CV) (Pławiak et al., 2019) [1] | 97.39 | ---- |
Spiking extreme learning machine (10CV) (Kuppili et al., 2019) [8] | 95.98 | 0.970 |
SVM + metaheuristics (10CV) (Hsu et al., 2018) [47] | 92.75 | ---- |
DL techniques: SNN and CNN (10CV) (Tai and Huyen, 2019) [52] | 87.54 | ---- |
Method | TS ACC (%) | # Rules | AUC-ROC (%) |
---|---|---|---|
Electric rule extraction from a neural network with a multihidden layer for a DNN trained by a DBN (5CV) [43] | 89.13 | 7.0 | ---- |
Boosted shallow trees (gentle boosting) BST-G2 (10 × 10CV) (Bologna and Hayashi, 2017) [51] | 87.90 | 73.4 | ---- |
Continuous Re-RX with J48graft (10 × 10CV) (Hayashi and Oisi, 2018) [18] | 87.82 | 14.1 | 0.870 |
Cont. Re-RX (10×10CV) (Hayashi et al., 2015) [29] | 86.93 | 14.0 | 0.689 |
1D FCLF-CNN with Re-RX with J48graft (10 × 10CV) (present paper) | 86.53 | 2.6 | 0.871 |
Discretized interpretable MLP (DIMLP)-B (10 × 10CV) (Bologna and Hayashi, 2017) [51] | 86.5 | 21.4 | ---- |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Hayashi, Y.; Takano, N. One-Dimensional Convolutional Neural Networks with Feature Selection for Highly Concise Rule Extraction from Credit Scoring Datasets with Heterogeneous Attributes. Electronics 2020, 9, 1318. https://doi.org/10.3390/electronics9081318
Hayashi Y, Takano N. One-Dimensional Convolutional Neural Networks with Feature Selection for Highly Concise Rule Extraction from Credit Scoring Datasets with Heterogeneous Attributes. Electronics. 2020; 9(8):1318. https://doi.org/10.3390/electronics9081318
Chicago/Turabian StyleHayashi, Yoichi, and Naoki Takano. 2020. "One-Dimensional Convolutional Neural Networks with Feature Selection for Highly Concise Rule Extraction from Credit Scoring Datasets with Heterogeneous Attributes" Electronics 9, no. 8: 1318. https://doi.org/10.3390/electronics9081318
APA StyleHayashi, Y., & Takano, N. (2020). One-Dimensional Convolutional Neural Networks with Feature Selection for Highly Concise Rule Extraction from Credit Scoring Datasets with Heterogeneous Attributes. Electronics, 9(8), 1318. https://doi.org/10.3390/electronics9081318