Next Article in Journal
Complex Fuzzy Geometric Aggregation Operators
Next Article in Special Issue
RIM4J: An Architecture for Language-Supported Runtime Measurement against Malicious Bytecode in Cloud Computing
Previous Article in Journal
Lossless and Efficient Polynomial-Based Secret Image Sharing with Reduced Shadow Size
Previous Article in Special Issue
An Efficient Object Detection Algorithm Based on Compressed Networks
Article Menu
Issue 7 (July) cover image

Export Article

Open AccessArticle
Symmetry 2018, 10(7), 250;

A Cluster-Based Boosting Algorithm for Bankruptcy Prediction in a Highly Imbalanced Dataset

Digital Contents Research Institute, Sejong University, Seoul 143-747, Korea
VNU University of Science, Vietnam National University, Hanoi, Vietnam
Author to whom correspondence should be addressed.
Received: 5 June 2018 / Revised: 28 June 2018 / Accepted: 29 June 2018 / Published: 2 July 2018
(This article belongs to the Special Issue Information Technology and Its Applications 2018)
Full-Text   |   PDF [1374 KB, uploaded 2 July 2018]   |  


Bankruptcy prediction has been a popular and challenging research topic in both computer science and economics due to its importance to financial institutions, fund managers, lenders, governments, as well as economic stakeholders in recent years. In a bankruptcy dataset, the problem of class imbalance, in which the number of bankruptcy companies is smaller than the number of normal companies, leads to a standard classification algorithm that does not work well. Therefore, this study proposes a cluster-based boosting algorithm as well as a robust framework using the CBoost algorithm and Instance Hardness Threshold (RFCI) for effective bankruptcy prediction of a financial dataset. This framework first resamples the imbalance dataset by the undersampling method using Instance Hardness Threshold (IHT), which is used to remove the noise instances having large IHT value in the majority class. Then, this study proposes a Cluster-based Boosting algorithm, namely CBoost, for dealing with the class imbalance. In this algorithm, the majority class will be clustered into a number of clusters. The distance from each sample to its closest centroid will be used to initialize its weight. This algorithm will perform several iterations for finding weak classifiers and combining them to create a strong classifier. The resample set resulting from the previous module, will be used to train CBoost, which will be used to predict bankruptcy for the validation set. The proposed framework is verified by the Korean bankruptcy dataset (KBD), which has a very small balancing ratio in both the training and the testing phases. The experimental results of this research show that the proposed framework achieves 86.8% in AUC (area under the ROC curve) and outperforms several methods for dealing with the imbalanced data problem for bankruptcy prediction such as GMBoost algorithm, the oversampling-based method using SMOTEENN, and the clustering-based undersampling method for bankruptcy prediction in the experimental dataset. View Full-Text
Keywords: bankruptcy prediction; undersampling technique; cluster-based boosting; machine learning bankruptcy prediction; undersampling technique; cluster-based boosting; machine learning

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).

Share & Cite This Article

MDPI and ACS Style

Le, T.; Hoang Son, L.; Vo, M.T.; Lee, M.Y.; Baik, S.W. A Cluster-Based Boosting Algorithm for Bankruptcy Prediction in a Highly Imbalanced Dataset. Symmetry 2018, 10, 250.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics



[Return to top]
Symmetry EISSN 2073-8994 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top