Comparing the Performance of Corporate Bankruptcy Prediction Models Based on Imbalanced Financial Data
Abstract
:1. Introduction
- ■
- Research question 1: How can we derive data sampling methods that improve the performance of corporate bankruptcy prediction models for imbalanced corporate financial information?
- ■
- Research question 2: How can we derive an optimal threshold technique that improve AUC performance even when considering the imbalanced corporate financial information?
2. Data and Methods
2.1. Data and Sampling
2.1.1. Data
2.1.2. Sampling
2.2. Models and Performance Measures
LSTM model pseudocode |
|
2.2.1. LSTM Model
2.2.2. Logistic Regression Model
2.2.3. k-Nearest Neighbor (k-NN) Model
2.2.4. Decision Tree Model
2.2.5. Random Forest Model
2.2.6. Performance Measure
3. Performance Analysis Results
4. Conclusions
Funding
Data Availability Statement
Conflicts of Interest
Appendix A
Category | Section | Feature |
---|---|---|
Financial Statements | Balance Sheet (1000 won) | Accumulations |
Retained Earnings | ||
Net assets of controlling shareholders (before capital stock reduction) | ||
Owners of Parent Equity | ||
Total Equity | ||
Comprehensive Income Statement (1000 won) | Earnings before tax | |
(Total Comprehensive Income Attributable to) Owners of Parent Equity | ||
Total Comprehensive Income | ||
Cash Flow Statement (1000 won) | Cash Flow | |
Financial Ratio | Stability (%) | Intangible Asset Ratio |
Equity Capital Ratio | ||
Borrowings and Bonds Payable Ratio | ||
Borrowed Capital Ratio | ||
Cash Flow/Total Debt | ||
Cash Flow/Total Equity | ||
Cash Flow/Total Asset | ||
Growth (yearly) (%) | Total Asset Growth Rate | |
Profitability (%) | Operating Revenue/Operating Expense | |
Profit Margin Ratio | ||
ROA (Current Net Income) | ||
ROA (Earnings before tax) | ||
ROA (Operating Profit) | ||
ROA (Total Comprehensive Income) | ||
ROE (Current Net Income) | ||
ROE (Earnings before tax) | ||
ROE (Operating Profit) | ||
ROE (Net profit of controlling shareholders) | ||
Activity (times) | Total Debt Turnover | |
Total Asset Turnover |
References
- Oh, W.S.; Kim, J.H. Forecasting corporate bankruptcy with artificial intelligence. J. Ind. Converg. 2017, 15, 17–32. [Google Scholar]
- Cha, S.; Kang, J. Corporate default prediction model using deep learning time series algorithm, RNN and LSTM. J. Intell. Inf. Syst. 2018, 24, 1–32. [Google Scholar]
- Barboza, F.; Kimura, H.; Altman, E. Machine learning models and bankruptcy prediction. Expert Syst. Appl. 2017, 83, 405–417. [Google Scholar] [CrossRef]
- Falavigna, G. Financial ratings with scarce information: A neural network approach. Expert Syst. Appl. 2012, 39, 1784–1792. [Google Scholar] [CrossRef]
- Hinton, G.E.; Osindero, S.; The, Y.-W. A fast learning algorithm for deep belief nets. Neural Comput. 2006, 18, 1527–1554. [Google Scholar] [CrossRef] [PubMed]
- Jang, Y.; Jeong, I.; Cho, Y.; Ahn, H. Business Failure Prediction with LSTM RNN in the Construction Industry. In Proceedings of the ASCE 2019 International Conference on Computing in Civil Engineering, Atlanta, GA, USA, 17–19 June 2019; pp. 1–8. [Google Scholar]
- Kim, H.; Cho, H.; Ryu, D. Corporate bankruptcy prediction using machine learning methodologies with a focus on sequential data. Comput. Econ. 2022, 59, 1231–1249. [Google Scholar] [CrossRef]
- Odom, M.D.; Sharda, R. A neural network model for bankruptcy prediction. In Proceedings of the 1990 IJCNN International Joint Conference on Neural Networks, San Diego, CA, USA, 17–21 June 1990; pp. 163–168. [Google Scholar]
- Wilson, R.L.; Sharda, R. Bankruptcy prediction using neural networks. Decis. Support. Syst. 1994, 11, 545–557. [Google Scholar] [CrossRef]
- Kim, H.; Cho, H.; Ryu, D. Corporate default predictions using machine learning: Literature review. Sustainability 2021, 12, 6325. [Google Scholar] [CrossRef]
- Brygata, M. Consumer Bankruptcy Prediction Using Balanced and Imbalanced Data. Risks 2022, 10, 24. [Google Scholar] [CrossRef]
- Zhou, L. Performance of corporate bankruptcy prediction models on imbalanced dataset: The effect of sampling methods. Knowl. Based Syst. 2013, 41, 16–25. [Google Scholar] [CrossRef]
- Garcia, V.; Jose, S.S.; Ramon, A.M. On the effectiveness of preprocessing methods when dealing with different levels of class imbalance. Knowl. Based Syst. 2012, 25, 13–21. [Google Scholar] [CrossRef]
- Syed, N.; Sharifah, H.; Shafinar, I.; Bee, W.Y. Personal bankruptcy prediction using decision tree model. J. Econ. Financ. Adm. Sci. 2019, 24, 157–170. [Google Scholar] [CrossRef]
- Amidon, A. PyOD: A Unified Python Library for Anomaly Detection. 11 May 2021. Available online: https://towardsdatascience.com/pyod-a-unified-python-library-for-anomaly-detection-3608ec1fe321 (accessed on 15 January 2023).
- Mishra, S.; Kshisagar, V.; Dwivedula, R.; Hota, C. Attention-Based Bi-LSTM for Anomaly Detection on Time-Series Data. In Proceedings of the 2021 ICANN International Conference on Artificial Neural Networks, Online, 14–17 September 2021; Springer: Berlin/Heidelberg, Germany, 2021; pp. 129–140. [Google Scholar]
- Xie, E.; Wang, W.; Yu, Z.; Anandkumar, A.; Alvarez, J.M.; Luo, P. SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers. In Proceedings of the 2021 NeurlPS 35th Conference on Neural Information Processing Systems, online, 6–14 December 2021; pp. 1–14. [Google Scholar]
- Richman, J.S.; Moorman, J.R. Physiogical time-series analysis using approximate entropy and sample entropy. Am. J. Physiol. Heart Circ. Physiol. 2000, 278, H2039–H2049. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Noh, S.-H. Analysis of gradient vanishing of RNNs and performance comparison. Information 2021, 12, 442. [Google Scholar] [CrossRef]
- Jagannath, V. Random Forest Template Tibco Spotfirer Wiki Page. 24 March 2017. Available online: https://community.tibco.com/wiki/random-forest-template-tibco-spotfirer-wiki-page (accessed on 15 January 2023).
Financial Information | t-Test p-Value |
---|---|
Total assets | 2.33 × 10−5 |
Parent company equity holder | 3.59 × 10−4 |
Intangible assets ratio | 4.84 × 10−2 |
Equity capital ratio | 1.48 × 10−20 |
Debt ratio | 9.63 × 10−6 |
Cash flows/total liabilities | 2.23 × 10−8 |
Total assets growth rate | 2.88 × 10−14 |
Operating revenue/operating expense | 6.71 × 10−13 |
Gross margin | 2.45 × 10−4 |
ROA (income from continuing operations before tax) | 1.79 × 10−9 |
ROA (operating profit) | 2.06 × 10−10 |
ROE (income from continuing operations before tax) | 2.89 × 10−5 |
ROE (operating profit) | 2.52 × 10−2 |
Total debt turnover ratio | 5.68 × 10−27 |
LSTM input data: Company (Financial information in 2012, …, 2020) |
LSTM label data: Company (Bankruptcy or non-bankruptcy): (1)/(0) |
LR, k-NN, DT, RF input data: Company (Financial information in 2012) ⋮ (Financial information in 2020) |
LR, k-NN, DT, RF label data: Company (Bankruptcy or non-bankruptcy in 2013) ⋮ (Bankruptcy or non-bankruptcy in 2021) |
Prediction outcome | |||
Non-bankrupt company | Bankrupt company | ||
Actual outcome | Non-bankrupt company | True positive (TP) | False negative (FN) |
Bankrupt company | False positive (FP) | True negative (TN) |
Model | Confusion Matrix | Accuracy | Non-Bankruptcy Precision | Non-Bankruptcy Recall | Bankruptcy Precision | Bankruptcy Recall | Non-Bankruptcy F1 Score | Bankruptcy F1 Score | AUC |
LR | 0.7333 | 0.7888 | 0.8883 | 0.3947 | 0.2344 | 0.8356 | 0.2941 | 0.7457 | |
k-NN | 0.7185 | 0.7754 | 0.8883 | 0.3235 | 0.1719 | 0.8281 | 0.2245 | 0.8676 | |
DT | 0.6556 | 0.7868 | 0.7524 | 0.3014 | 0.3438 | 0.7692 | 0.3212 | 0.6587 | |
RF | 0.7148 | 0.7841 | 0.8641 | 0.3488 | 0.2344 | 0.8222 | 0.2804 | 0.8095 | |
(a) Performance results of the LR, k-NN, DT, and RF for the small random sample | |||||||||
Model | Confusion Matrix | Accuracy | Non-Bankruptcy Precision | Non-Bankruptcy Recall | Bankruptcy Precision | Bankruptcy Recall | Non-Bankruptcy F1 Score | Bankruptcy F1 Score | AUC |
LR | 0.7519 | 0.7633 | 0.9541 | 0.64 | 0.2162 | 0.8481 | 0.3232 | 0.7929 | |
k-NN | 0.7704 | 0.7792 | 0.9541 | 0.7 | 0.2838 | 0.8578 | 0.4038 | 0.8076 | |
DT | 0.7444 | 0.8038 | 0.8571 | 0.5410 | 0.4459 | 0.8296 | 0.4889 | 0.5979 | |
RF | 0.7889 | 0.8089 | 0.9286 | 0.6889 | 0.4189 | 0.8646 | 0.5210 | 0.8326 | |
(b) Performance results: LR, k-NN, DT, and RF for the small dataset sampled using approximate entropy | |||||||||
Model | Confusion Matrix | Accuracy | Non-Bankruptcy Precision | Non-Bankruptcy Recall | Bankruptcy Precision | Bankruptcy Recall | Non-Bankruptcy F1 Score | Bankruptcy F1 Score | AUC_1 |
AUC_2 | |||||||||
LR | 0.9834 | 0.9839 | 0.9995 | 0 | 0 | 0.9916 | Not defined | 0.7906 | |
0.9998 | |||||||||
k-NN | 0.9810 | 0.9839 | 0.9971 | 0 | 0 | 0.9904 | Not defined | 0.6500 | |
0.9998 | |||||||||
DT | 0.9595 | 0.9848 | 0.9739 | 0.0484 | 0.0811 | 0.9793 | 0.0606 | 0.5477 | |
0.9998 | |||||||||
RF | 0.9834 | 0.9839 | 0.9996 | 0 | 0 | 0.9917 | Not defined | 0.7389 | |
0.9998 | |||||||||
(c) Performance results: LR, k-NN, DT, and RF for the total data, where AUC_1 = AUC at the threshold of 0.5 and AUC_2 = AUC using the optimal threshold. | |||||||||
Accuracy | Precision | Recall | F1 Score | AUC | |||||
Logistic | 0.7570 | 0.0001 | 0.2174 | 0.0002 | 0.4872 | ||||
SVM | 0.7236 | 0.0002 | 0.3913 | 0.0003 | 0.5575 | ||||
RF | 0.9899 | 0.0023 | 0.2174 | 0.0045 | 0.6037 | ||||
RNN | 0.9789 | 0.0024 | 0.4783 | 0.0048 | 0.7286 | ||||
LSTM | 0.9936 | 0.0058 | 0.3478 | 0.0114 | 0.6707 | ||||
Ensemble | 0.9826 | 0.0029 | 0.4783 | 0.0058 | 0.7305 | ||||
(d) Performance results: Bankruptcy Forecasting Performance by Methodology [7] |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Noh, S.-H. Comparing the Performance of Corporate Bankruptcy Prediction Models Based on Imbalanced Financial Data. Sustainability 2023, 15, 4794. https://doi.org/10.3390/su15064794
Noh S-H. Comparing the Performance of Corporate Bankruptcy Prediction Models Based on Imbalanced Financial Data. Sustainability. 2023; 15(6):4794. https://doi.org/10.3390/su15064794
Chicago/Turabian StyleNoh, Seol-Hyun. 2023. "Comparing the Performance of Corporate Bankruptcy Prediction Models Based on Imbalanced Financial Data" Sustainability 15, no. 6: 4794. https://doi.org/10.3390/su15064794
APA StyleNoh, S.-H. (2023). Comparing the Performance of Corporate Bankruptcy Prediction Models Based on Imbalanced Financial Data. Sustainability, 15(6), 4794. https://doi.org/10.3390/su15064794