An Ensemble Classifier-Based Scoring Model for Predicting Bankruptcy of Polish Companies in the Podkarpackie Voivodeship

Pisula, Tomasz

doi:10.3390/jrfm13020037

Open AccessArticle

An Ensemble Classifier-Based Scoring Model for Predicting Bankruptcy of Polish Companies in the Podkarpackie Voivodeship

by

Tomasz Pisula

Department of Quantitative Methods, Faculty of Management, Rzeszow University of Technology, al. Powstancow W-wy 10, 35-959 Rzeszow, Poland

J. Risk Financial Manag. 2020, 13(2), 37; https://doi.org/10.3390/jrfm13020037

Submission received: 21 December 2019 / Revised: 13 February 2020 / Accepted: 15 February 2020 / Published: 19 February 2020

(This article belongs to the Special Issue Modern Methods of Bankruptcy Prediction)

Download

Browse Figures

Versions Notes

Abstract

This publication presents the methodological aspects of designing of a scoring model for an early prediction of bankruptcy by using ensemble classifiers. The main goal of the research was to develop a scoring model (with good classification properties) that can be applied in practice to assess the risk of bankruptcy of enterprises in various sectors. For the data sample, which included 1739 Polish businesses (of which 865 were bankrupt and 875 had no risk of bankruptcy), a genetic algorithm was applied to select the optimum set of 19 bankruptcy indicators, on the basis of which the classification accuracy of a number of ensemble classifier model variants (boosting, bagging and stacking) was estimated and verified. The classification effectiveness of ensemble models was compared with eight classical individual models which made use of single classifiers. A GBM-based ensemble classifier model offering superior classification capabilities was used in practice to design a scoring model, which was applied in comparative evaluation and bankruptcy risk analysis for businesses from various sectors and of different sizes from the Podkarpackie Voivodeship in 2018 (over a time horizon of up to two years). The approach applied can also be used to assess credit risk for corporate borrowers.

Keywords:

bankruptcy prediction; ensemble classifiers; boosting; bagging; stacking; scoring models

1. Introduction

According to statistical data from 2018–2019, 30–60 businesses in Poland announce bankruptcy each month. Business bankruptcy is invariably an adverse phenomenon for the business itself and its employees, but it is also a problem for its creditors, banks and partners. The high number of bankruptcies reported may also lead to negative consequences locally—for the economic development and economic circumstances of the region—and on the national scale—for the economy of the whole country. For this reason, the issue of early prediction of business bankruptcy, and therefore the possibility of forecasting the risk of business bankruptcy over a long time horizon (even up to several years), is a very important financial and economic problem. In its financial and economic dimension, bankruptcy (i.e., business default) is defined as a situation in which a business is unable (for various reasons) to meet its liabilities towards creditors. For businesses operating in market economics conditions, a potential risk of bankruptcy always exists. The risk is the most commonly defined as the probability of defaulting on liabilities incurred (probability of default, PD). The subject of modeling risk of bankruptcy is also of enormous importance for institutions granting corporate loans, to whom the bankruptcy of a corporate debtor means a potential loss of the loan granted.

The main objective of this study was to design a scoring model based on ensemble classifiers which could be used to forecast the risk of bankruptcy for Polish businesses conducting activity in the Podkarpackie Voivodeship over a time horizon of up to two years. One of the reasons for using a developed scoring model based on ensemble classifiers to forecast bankruptcy risk for companies from the Podkarpackie region in this study is the fact that the Podkarpackie Voivodeship (along with several other Polish regions) just after a period of political transformation of Poland from socialism to market economy, was notably lagging behind in development. It belonged to the group of several eastern regions (voivodeships) from the so-called the eastern wall, which was overlooked and underestimated in the policies pursued by relevant governments. The selection of companies from the region was also influenced by the fact that the Podkarpackie Voivodeship is currently one of the “development tigers” in Poland and is catching up quickly. This is mainly due to the more effective policies of the current government aimed at equalizing the development opportunities of Polish regions. The Podkarpackie Voivodship is not a very large voivodeship in relation to other regions of Poland as it occupies 11th place in a ranking of all 16 voivodeships, with an area of 17,846 km² (source: Główny Urząd Statystyczny (2019) (Statistics Poland)—Local Data Bank, https://bdl.stat.gov.pl). The attractiveness of the voivodeship, however, is influenced by its geographical location, which is conducive to the development of ecological agriculture and tourism (also international—the Bieszczady Mountains). A big advantage of the region is also its border location (the region borders Ukraine and Slovakia—which also belongs to the EU). Due to its population size, the Podkarpackie region belongs to the group of medium-populated regions of Poland and takes 8th place in this ranking, with a population of approximately 2.1 million (source: Główny Urząd Statystyczny (2019) (Statistics Poland)—Local Data Bank, https://bdl.stat.gov.pl). The voivodeship also has no very developed industries, in comparison to other more industrialized regions of Poland. Nevertheless, the Podkarpackie Voivodeship belongs to the group of the fastest developing regions of Poland. In terms of income per capita, the Podkarpackie Voivodeship took 2nd place in 2018 in the ranking of 16 Polish regions-voivodships (revenues at the level of PLN 562.4 per inhabitant (source: Statistics Poland—Local Data Bank, https://bdl.stat.gov.pl). Also in 2018, the Podkarpackie region was the most dynamically developing region of Poland in terms of the growth of generated GDP (GDP). The Podkarpackie recorded an increase of 7.8% of GDP in 2018 compared to the previous year. In 2018, the GDP generated in the Podkarpackie already constituted 3.9% of Poland’s GDP and was 9th place in the regions (ranking source: Statistics Poland—Local Data Bank, https://bdl.stat.gov.pl/BDL/start). This proves that the region’s economy is already very dynamic but at present is still progressing. The economy of the Podkarpackie region stands out positively and has a very large impact on its potential cluster of aviation industry enterprises belonging to the so-called aviation valley and the dynamic development of road and transport infrastructure (e.g., the route of the international European North-South communication line Via Carpatia), as well as the development of innovation (innovative technologies) in the region. The companies that drive development in the region belong to Stowarzyszenie Dolina Lotnicza (2019) (Aviation Valley Association), that include many aviation industry companies that provide services to major aviation manufacturers around the world (e.g., Boeing, Airbus, source: http://www.dolinalotnicza.pl/en/business-card). These include companies such as 3M Poland, 3D Robot, Boeing Distribution Services, Pratt & Whitney Poland, Collins Aerospace, Goodrich Aerospace Poland, General Electric Company Poland, Hamilton Sundstrand Poland, Heli-One, Safran Transmission Systems Poland and MTU Aero Engines Poland. The very dynamic development of economic potential in the Podkarpackie region also affects the quality of life of its inhabitants. The Podkarpackie Voivodeship has been high in the quality of life rankings for several years. All these factors make it sensible to conduct a comprehensive analysis and an assessment of the risk of bankruptcy of enterprises operating in the Podkarpackie region using the most effective models of forecasting and assessing the risk of their bankruptcy. Therefore, first the work focused on developing an adequate scoring model for bankruptcy forecast using ensemble classifiers, and analyzing and verifying its prognostic capacity (classification efficiency), while only later on using it in practice to comprehensively assess the bankruptcy risk of enterprises from the Podkarpackie region belonging to various sectors of the economy (depending on the declared classification of their activities) that can also be distinguished by their size.

The article details the stages in which the scoring model was designed and implemented in practice. The scoring model design stage involved the comparison of the predictive capability of ensemble models used in this study with that of conventional single classifiers. The results of previous works of many authors (see e.g., Anwar et al. 2014; Barboza et al. 2017; Tsai et al. 2014) indicate that the models based on ensemble classifiers help achieve more accurate results and improve the discriminant capability of the model. On the basis of the scoring model designed, a bankruptcy risk assessment for businesses from the Podkarpackie Voivodeship was carried out based on the sector in which they operated and the size of the business.

The main innovation of the research presented in the article is that previous studies of other authors did not discuss the practical use of the scoring model for comprehensive analysis of the bankruptcy risk of companies (also from different sectors) operating in the Podkarpackie region, using the ensemble classifiers approach.

2. Literature Review

The various problems of bankruptcy of businesses are widely described in the literature. The significance and salience of the bankruptcy problem has motivated many authors to concentrate on this issue in their research. The first mentions of the subject of modeling business bankruptcy and forecasting its likelihood appeared in economic and financial literature in 1968. The first study on risk bankruptcy modeling was published by Altman (1968). The early bankruptcy prediction studies applied statistical methods and mainly concerned the use of different variants of discriminant analysis or logistic regression (Ohlson 1980; Begley et al. 1996). Since those models had significant limitations, artificial intelligence and machine learning methods that were successfully applied in image recognition tasks were gradually also implemented in bankruptcy forecasting. It was found that machine learning techniques such as neural networks (NNet), Support Vector Machines (SVM) and ensemble classifier methods have better forecasting capabilities and higher classification effectiveness than conventional approaches. An overview of the previous research on the application of statistical methods and machine learning techniques in business bankruptcy prediction can be found in studies such as the ones by Kumar and Ravi (2007) and Lessmann et al. (2015). Alaka et al. (2018) presented a comprehensive overview of literature and systematics of predictive models used in business bankruptcy forecasting, including: purpose of research, method of selecting variables for the model, sample size for analyzed businesses (also including bankrupt ones) and a comparison of the effectiveness of models’ classification measures.

Some works deal with the issues of forecasting and assessing the risk of bankruptcy of enterprises, taking into account the specificities of the sector of their activity. Rajin et al. (2016) conducted a bankruptcy risk assessment for Serbian agricultural enterprises, which is one of the most significant sectors of the Serbian economy. The classification efficiency of several models was compared using the methods of linear discriminant analysis. Their research shows that models taking into account the specifics of economies and market characteristics (e.g., the European market—DF-Kralicek’s model) give better results for the Serbian economy than models created for American markets (e.g., the classic Altman Z-Score model). Karas et al. (2017) dealt with similar problems, who showed that classic scoring models developed for the US economy (Z-Score Altman, Altman-Sabato’s models) and IN05—designed and developed for the Czech enterprises—are less effective compared to the original validation results. This forces researchers to develop more adequate models, in particular taking into account the specificity and financial indicators of the agricultural sector and the economy of the country affecting the bankruptcy of enterprises. Receiver-Operating Characteristic (ROC) curves were used to measure the effectiveness of the models. Chen et al. (2013) dealt with the problems of forecasting the bankruptcy risk of industrial enterprises in the manufacturing sector in China. They used a modified variant of Multi-Criteria Linear Programming algorithm (so-called MC2LP algorithm) to forecast the risk of bankruptcy of 1499 Chinese enterprises from the studied sector and selected 36 financial indicators to assess their financial condition. The classification efficiency of the studied model was compared with the efficiency of the classic MCLP model and the SVM approach. Matrix correctness (compliance) matrices were used as measures of classification accuracy. The use of the model proposed by the authors enables setting up a variable value for the cut-off point (determining the expected belonging of objects to classes) and thus systematically correcting incorrect classification errors. Topaloglu (2012) dealt with the forecast of bankruptcy of American enterprises from the manufacturing sector using a multi-period logistic regression model, the so-called hazard models. The research period covered bankruptcies from 1980–2007 and the results show that macroeconomic diagnostic variables in a model such as GDP have the very large impact on the assessment of their bankruptcy. The study shows that accounting indicators for assessing the financial condition of enterprises used in the model lose their predictive power (become irrelevant) when global market and macroeconomic indicators are taken into account. Achim et al. (2012) studied the financial risk of bankruptcy for Romanian enterprises from the manufacturing sector using the Principal Component Analysis method in the period of 2000–2011, and thus taking into account the impact of the global crisis on financial markets. The research sample included 53 enterprises registered in Romania and operating in the production sector, including 16 selected and most frequently used financial indicators used in the study. The study shows good predictive quality of the model tested and presents its potential application possibilities. In the literature, you can also find works on the modeling of bankruptcy risk for enterprises operating in other sectors of the economy, e.g., Marcinkevicius and Kanapickiene (2014) for companies from the construction sector, as well as Kim and Gu (2010), Youn and Gu (2010) and Diakomihalis (2012) for companies in the hotel and restaurant sector.

It is also necessary to emphasize an important aspect in the research on the risk of bankruptcy of enterprises, which is taking into account the impact of economic cycles and selected macroeconomic variables of the market while considering the effect of cyclical economic conditions of countries. In Vlamis (2007) statistical logistic and probit regression models were used to forecast the risk of bankruptcy of American real estate companies in the period 1980–2001. It has been shown that financial indicators such as profitability, debt service and company liquidity are important determinants of the risk of bankruptcy of the surveyed enterprises. A number of key macroeconomic financial variables have also been used because the risk of borrowers’ bankruptcy depends on the state of the economy and the current business cycle. Similar issues were dealt with in the publication by Hol (2007), which concerned the study of the impact of business cycles on the probability of bankruptcy of Norwegian companies. It has been shown that models that take into account the impact of economic cycles have better prognostic properties than models that only take into account the financial indicators of companies. In a similar study, Bruneau et al. (2012) analyzed the relationship between macroeconomic shocks and exposure to the risk of bankruptcy of companies in France belonging to different sectors of classification of activities. The study of the dependence of the risk of bankruptcy on economic cycles was carried out using the two-equation VAR model based on data from 1990–2010.

At this point one should also mention Polish authors’ significant contribution to the development of bankruptcy forecast models which take into account the specific nature of the Polish economy. Their research is mostly based on classical techniques, using statistical methods or machine learning tools and the methods for predicting and evaluating the risk of business bankruptcy. The results obtained by Polish authors studying bankruptcy risk modeling can be found in publications by Korol (2010), Hadasik (1998), Hamrol and Chodakowski (2008), Mączyńska (1994), Prusak (2005) and Ptak-Chmielewska (2016). In the context of the research done by Polish authors, a very interesting and detailed comparative analysis of the subject of enterprise bankruptcy forecasting in East-Central Europe and an overview of models applied from the perspective of developing economies of the countries of the region in the transformation period was presented by Kliestik et al. (2018).

In recent years, ensemble classifiers have been successfully used for predicting bankruptcy of businesses. Some studies of this type include Barboza et al. (2017), Brown and Mues (2012) and Zięba et al. (2016). They are dedicated to the application of ensemble classifiers in forecasting bankruptcy of businesses and demonstrate that the ensemble classifiers offer better forecasting properties and accuracy than conventional statistical methods. Moreover, a study by Kim et al. (2015) proved that ensemble models are more resistant to the sample imbalance problem (for bankrupt businesses and those at no risk of bankruptcy) during the statistical data preparation phase.

Many studies on the application of ensemble classifiers in business bankruptcy forecasting refer to boosting and bagging methods (sequential correction and classification error minimization, as well as component classifier result sampling and combining) in order to increase classification performance of the entire forecasting system. In studies by Cortes et al. (2007) and Heo and Yang (2014), Adaboost (an adaptive boosting algorithm) was applied to decision trees as basic classification models. The use of ensemble classifiers with a classifier boosting technique based on neural network classifiers was discussed in studies by Alfaro et al. (2008), Fedorova et al. (2013), Kim and Kang (2010) and West et al. (2005). A different approach was adopted by Kim et al. (2015) and Sun et al. (2017) who used support vector machines (SVM) as base classifiers, which were boosted as a group of ensemble classifiers. Bagging is also a method frequently used in practical applications of ensemble classifiers. This subject dealt with studies which analyze the classification effectiveness of such ensemble classifiers by relying on several models of base classifiers developed by Hua et al. (2007), Zhang et al. (2010) and Twala (2010). The use of ensemble classifiers with combining (stacking) the results of several classifiers in a single meta-classifier was discussed in studies such as those by Iturriaga and Sanz (2015), Tsai and Wu (2008) and Tsai and Hsu (2013). Furthermore, many studies are dedicated to the use of various techniques of combining the results of base model classification: such as neural networks in the form of self-organizing maps (SOMs), rough sets techniques, case-based reasoning and classifier consensus methods. Examples of the use of this type of ensemble classifiers were examined by Ala’raj and Abbod (2016), du Jardin (2018), Chuang (2013) and Li and Sun (2012).

3. Environmental Background of the Research Conducted

3.1. Statistical Description of Bankruptcies in Poland

According to data from Ogólnopolski Monitor Upadłościowy (2019) (Coface Nationwide Bankruptcy Monitor—source: http://www.emis.com, http://www.coface.pl/en) a total of 798 businesses declared bankruptcy in 2018. Most bankruptcies were reported in October and September (76 and 74, respectively) and in the following months: March, April, May (67, 66 and 65, respectively), with 61 bankruptcies reported in January. The months with the relatively lowest number of bankruptcies were declared in August (42) and February and December (45 bankruptcies). Comparing the structure of business bankruptcies by voivodeships in 2018 (Figure 1), we may notice that the highest number of bankrupt businesses were reported in the Mazowieckie Voivodeship—156 (which constitutes 22% of all bankrupt enterprises). Further positions in the ranking, with a significantly lower number of bankruptcies, are held by: Śląskie Voivodeship—84 (12%), Wielkopolskie Voivodeship—68 (10%), Dolnośląskie Voivodeship—61 (9%), Podkarpackie Voivodeship—47 (7%) and Małopolskie Voivodeship—44 (6%). The lowest number of bankruptcies is reported in: Lubuskie Voivodeship—13 (2%), Opolskie Voivodeship—15 (2%), Podlaskie Voivodeship—19 (3%) and Świętokrzyskie Voivodeship—20 (3%). During the first three months of the year 2019, most bankruptcies were also reported in the Mazowieckie Voivodeship—27, Dolnośląskie Voivodeship—15, Śląskie Voivodeship—14 and Wielkopolskie Voivodeship—11. Based on the latest available data from Q1 2019, the largest number of bankruptcies were recorded in the following voivodeships: Mazowieckie—27, Dolnośląskie—15, Śląskie—14, Wielkopolskie—11 and Łódzkie 10. The least in Świętokrzyskie—1, Opolskie—2, Podlaskie and Lubuskie—3, Warmińsko—Mazurskie—4. For the comparison in the Podkarpackie voivodeship, there were 6 bankruptcies.

When analyzing the number of bankrupt businesses in Poland in the year 2019 depending on their business activity, we may notice that the highest number of bankruptcies concerned businesses carrying out varied individual activities (one-person businesses, self-employment)—108 (27% bankrupt), followed by commercial law companies from the commerce (trade) sector—91 (22%), and from the industrial and service sector—70 (17%) and 63 (16%), respectively. In 2019, 51 (13%) businesses from the construction sector, 11 (3%) transport and logistics businesses and 9 (2%) businesses involved in other activities declared their bankruptcy. Figure 2 shows the distribution of the number of bankrupt businesses by their type and the sector of their activity.

3.2. Characteristics of Companies Operating in the Podkarpackie Voivodeship

According to Emerging Markets Information Service (2019)―EMIS (http://www.emis.com), in 2018 about 3679 companies and partnerships were registered and operating in the Podkarpackie Voivodeship (the number of available financial statements for 2018 in the database). Their reported sector of activity belonged to one of the following 18 areas: A—farming, forestry and fishing, B—mining and extraction, C—industrial processing, D—production of energy, supply of water, gas and other energy sources, E—waste, waste water and sewage management, F—construction, G—wholesale and retail, and servicing vehicles and motorcycles, H—transport and storage management, I—accommodation and food services, J—information and communications, K—finance and insurance, L—services for the property market, M—scientific, specialist and technological activity, N—administration and support, P—education, Q—health and social care, R—culture, entertainment and leisure, S—other services. Figure 3 presents the structure of the number of businesses operating in the Podkarpackie Voivodeship by sector.

In 2018, 997 businesses in the Podkarpackie Voivodeship, a vast majority of this area’s businesses, operated in the wholesale and retail sector. Economic activity in the field of industrial processing was declared by 718 enterprises, followed by sectors such as the construction, scientific, specialist and technological activity, and services for the property market sectors (344, 321, 270 businesses, respectively). The lowest number of businesses operated in sectors such as education—37, production of energy and supply of energy sources—35, culture—27, other services—24, as well as mining and extraction—21.

Figure 4 presents the structure of the number of businesses in the Podkarpackie Voivodeship by the duration for which they have functioned (in years). Most businesses, i.e., 1792 (which corresponds to 49% of all analyzed entities) have operated in the market for a very long time—10 years. Nearly as many businesses, i.e., 1720 (47% of the total number), have been active for a medium number of years, whereas ‘young’ enterprises (167), established in the period from 2017 to 2019 and active for up to two years, constituted only 4% of all businesses analyzed.

An analysis of businesses operating in the Podkarpackie Voivodeship according to their size (Figure 5) shows that 40% (1461) of all enterprises are very small, they are the so-called micro-enterprises. Small businesses constituted a further 15%. Overall, over a half of businesses (55%) were either micro- enterprises or small enterprises. The number of medium and small enterprises was more or less equal, which corresponds respectively to 22% and 23% of all entities analyzed. The size of the enterprise was identified in accordance with the legal provisions of the classification of Polish enterprises adapted to EU law and directives. Micro enterprises were identified according to the rule: number of employees <10 and annual Turnover <= 2 m €. Small enterprises were identified as not being micro enterprises and fulfilling the conditions: number of employees <50 and annual Turnover <= 10 m €. Medium enterprises were identified as not being small and fulfilling the conditions: number of employees <250 and annual Turnover <= 50 m €. Therefore, large enterprises were identified according to the rule: number of employees >= 250 and annual Turnover >50 m € (source: https://ec.europa.eu/growth/smes/business-friendly-environment/sme-definition_pl).

Among all businesses operating in the Podkarpackie Voivodeship, only 7 were listed in the stock market, while 3672 were non-listed companies. An analysis of legal forms of businesses in the Podkarpackie Voivodeship (Figure 6) shows that the vast majority (73.6%) are limited liability companies (private limited companies). There are 2.9% enterprises operating as public limited companies, and only 0.5% are limited partnerships. The remaining businesses, having other legal forms, constitute 23% of all enterprises analyzed in this study.

4. Materials and Methods

As can be seen in the above analysis of literature, in practice business bankruptcy risk assessment makes use of various classifier models. Both classical statistical methods and more advanced non-statistical methods are used, with the latter based on various machine learning techniques. The use of so-called ensemble classifiers, i.e., classifiers designed to increase classification efficiency in relation to the conventional approach (which is based on single classifiers), are becoming increasingly popular—for obvious reasons. Table 1 contains an overview of business bankruptcy risk forecasting models that are most often used in practice.

Classical business bankruptcy forecasting models using single classifier models are very-well known and presented in many publications. Meanwhile, the presents study focuses mainly on a detailed presentation of the ensemble classifier methodology. A detailed discussion of classical models and models used in business bankruptcy forecasts can be found e.g., in monographs by Kuhn and Johnson (2013) and Hastie et al. (2013).

4.1. Ensemble Classifier Methodology

The ensemble classifier methodology involves combining several single classifiers into an ensemble of classifiers performing the same task in order to improve the effectiveness of classification (the discriminant capability of the entire model) defined as correct assignment of objects into expected classes. This is done by suitably aggregating (often by weighing) results of classification obtained from component classifiers to arrive at a resultant classifier with the best possible forecasting capabilities (surpassing those of all base classifiers in use). Figure 7 shows a functional diagram of ensemble classifiers.

A detailed description of ensemble classifier methodology, their types, characteristics and numerous practical applications can be found in monographs by Zhang and Ma (2012) as well as Zhou (2012). In practice, three well-known approaches: boosting, bagging and combining are applied in ensemble classifier methods. The terminology of boosting ensemble classifiers refers to a broad class of algorithms which enable boosting “weak classifiers”, turning them into “strong qualifiers” (of excellent classification performance approaching that of perfect models). An example of such approach is AdaBoost—an adaptive boosting algorithm (Freund and Schapire 1997). In AdaBoost, classifiers of the same type, e.g., boosted classification trees, serve as base classifiers. Voting strategies are most commonly used in order to determine object classes, aggregating their output classifications, such as majority voting, plurality voting, weighted voting or soft voting. The AdaBoost.M1 adaptive boosting algorithm in the case of object classification for two classes contains the following steps (see: Algorithm 1, Zhang and Ma 2012, p. 14).

Algorithm 1 AdaBoost.M1 algorithm

Inputs: a set of input data for the training sample ${x_{i}, y_{i}}$ , $i = 1, \dots, N, y_{i} \in {ω_{1}, ω_{2}}$ —learning with a pattern
Ensemble classifier: an ensemble classifier with the number of boosting cycles $T$ —iterations
Initialization: initial distribution of weights for observation from training set $D_{1} (i) = 1 / N$
Perform in loop FOR $t = 1, 2, \dots, T$
-
Pick a random training subset $S_{t}$ with distribution $D_{t}$
-
Train the base classifier on subset $S_{t}$ , assume hypothesis $h_{t} : X \to Y$ concerning classification accuracy relative to the pattern
-
Calculate classification error for hypothesis $h_{t} : ε_{t} = \sum_{i} I ⟦ h_{t} (x_{i} \neq y_{i}) ⟧ D_{t} (x_{i})$
-
Interrupt if $ε_{t} > 1 / 2$ .
-
Assume $β_{t} = ε_{t} / (1 - ε_{t})$
-
Adjust weight distribution: $D_{t + 1} (i) = \frac{D_{t} (i)}{Z_{t}} \cdot {\begin{array}{l} β_{t}, i f h_{t} (x_{i}) = y_{i} \\ 1, o t h e r w i s e \end{array}$ , where $Z_{t} = \sum_{i} D_{t} (i)$ —is the normalization constant enabling $D_{t + 1}$ to become the correct probability distribution
End FOR loop
Weighted majority voting. For a given unnamed instance of $z$ obtain a voting result concerning case membership in each of the classes $V_{c} = \sum_{t : h_{t} (z) = ω_{c}} l o g (\frac{1}{β_{t}}), c = 1, 2$ .
Output: Membership in the class of the greatest value of $V_{c} .$

The name of the second group of ensemble classifier making use of the bagging method is derived from the English abbreviation: Bootstrap AGGregatING (Breiman 1996). This group of ensemble classifiers involves bootstrap sampling to obtain training subsets for base classifiers. Each the classifier is therefore trained on a different training sample, and the results are aggregated. Here, classifiers of the same type are used most often as base classifiers. An example of such a type of ensemble classifiers is Random Forest. The bootstrap aggregation algorithm for object classification into two classes has the following steps (see: Algorithm 2, Zhang and Ma 2012, p. 12).

Algorithm 2 Bagging algorithm

Inputs: a set of input data for the training sample $S$ ; training algorithm with a pattern, base classifier $T$ —ensemble size; $R$ —percentage of the sample for determining training subsets for sampling
Perform in loop FOR $t = 1, 2, \dots, T$
-
Randomly select a replication subset—training sample $S_{t}$ by selecting R % of $S$ at random
-
Train the base classifier on subset $S_{t}$ , obtain hypothesis for classifier $h_{t}$ concerning classification accuracy relative to the pattern
-
Add $h_{t}$ to ensemble, $ε \leftarrow ε \cup {h_{t}}$
End FOR loop
Combine classification results in an ensemble combination—simple majority voting: for a given unnamed instance of $x$ , obtain a voting result concerning case membership in each of the classes
Evaluate class membership results on the basis of ensemble classifier $ε = {h_{1}, \dots, h_{T}}$ for the analysed case x
Let $v_{t, c} = 1$ , if $h_{t}$ selects class $ω_{c}$ , otherwise 0
Obtain overall final vote result for each class $V_{c} = \sum_{t = 1}^{T} v_{t, c}, c = 1, 2$
Output: Membership in the class of the greatest value of $V_{c} .$

A group of methods called ensemble combining represents a wholly different approach. The group includes the so-called combined methods utilizing results of classification functions for single (base) classifiers and aggregating them into the result classification function using the averaging approach (simple or weighted averaging of base classifier results), voting approach (using various types of voting strategies, e.g., majority voting) or stacked generalization approach. The stacking ensemble methodology, pioneered by Wolpert (1992), is based on a combined approach whereby base classifiers (level 1 classifiers) are trained on the same random samples, and then relevant classification results (their classification functions) are used as training samples for the new meta-classifier (level 2 classifier) and aggregated in result classifications.

4.2. Feature Selection Process in Bankruptcy Prediction

A deeply significant classification-related issue is the problem of choosing the appropriate (optimum) set of diagnostic variables (i.e., the feature selection problem). Detailed characteristics of methods used for the selection of relevant variables for forecast models can be found in studies by John et al. (1994) and Jovic et al. (2015). Wrapper methods are frequently used techniques which analyze possible predictor subsets and determine the effectiveness of their impact on the model’s dependent variable on the basis of a search algorithm, the best subset of variables and the classification method applied. In order to search all variable subsets, the search algorithm is ‘wrapped’ around the classification model, hence the name of this group of methods. Wrapper feature selection methods are based on various approaches of searching for the optimum subset of predictors. Such approaches can be divided into two basic groups: deterministic and randomized. This group of deterministic methods applies various types of sequential algorithms, e.g., progressive stepwise selection or backward stepwise elimination. Wrapper feature selection methods most frequently use random algorithms such as simulated annealing, genetic algorithms or ant colony optimization. A method employing a genetic algorithm in order to search for the optimum subset of predictors is often used to select variables for bankruptcy models. The genetic algorithm of Feature Selection (see: Algorithm 3) is executed according to the procedure designed by Kuhn and Johnson (2013).

Algorithm 3 Genetic Algorithm Feature Selection (GAFS)

Define: stopping criteria, number of children for each generation (gensize), and probability of mutation (pm)
Generate: an initial random set of m binary chromosomes, each of length p
REPEAT
FOR each chromosome DO
- Tune and train a model and compute each chromosome’s fitness
END
FOR reproduction k = 1, …, gensize/2 DO
- Select two chromosomes based on the fitness criterion
- Crossover: Randomly select a locus and exchange each chromosome’s genes beyond the loci
- Mutation: Randomly change binary values of each gene in each new child chromosome with probability pm
END
UNTIL stopping criteria are met

4.3. Data Samples Description

The original research sample used in the study included data for 1739 Polish enterprises (bankrupt and not threatened with bankruptcy). This sample included calculated values for 19 financial indicators determining the financial condition of selected enterprises (characterized in detail in Section 5.1 and selected for the study using the wrapper search technique and genetic algorithm discussed in detail in Section 4.2). For bankrupt enterprises, the values of diagnostic variables were set at 1 or 2 years before the actual period of their bankruptcy. Statistical data came from the financial statements of enterprises from 2010–2018 available in the EMIS database (http://www.emis.com). Bankruptcy episodes were identified on the basis of statistics from the EMIS database source: Ogólnopolski Monitor Upadłościowy (2019) (Coface Polish National Bankruptcy Monitor, source: http://www.emis.com, http://coface.pl/en). The balanced sample used included a total of 1739 research cases from all major sectors of the economy (865—cases for bankrupt enterprises and 874—randomly selected cases for enterprises not at risk of bankruptcy with strong financial conditions). The condition for the non-defaulted enterprises was evaluated on the basis of careful analysis and evaluation of values of many financial indicators, such as profitability ratios, debt ratios, management performance indicators etc., which determined their financial condition and low exposure to the bankruptcy risk. A 70% teaching sample was drawn from the research sample (1217 enterprises: 592—bankrupt and 625—not threatened with bankruptcy), which was used to train and calibrate the parameters of the bankruptcy models used. The remaining cases constituted a random 30% set for the test-validation sample (522—enterprises: 273—bankrupt and 249—not threatened with bankruptcy), which was used at the stage of validation of models to check their predictive properties for new, unknown cases. A separate research sample was designated for enterprises from the Podkarpackie Voivodeship, which included 2133 enterprises of various sizes from the Podkarpackie region registered in various sectors of economic activity. This sample included all enterprises operating in the Podkarpackie for which financial statements (in EMIS database) for 2018 were available. This sample was used as a research set to assess the risk of bankruptcy (in the 2-year horizon up to 2020) of enterprises in the Podkarpackie region under analysis based on an estimated scoring model using the approach of ensemble classifiers.

4.4. Procedure of Mapping PD into Scores

In this study, the score scaling approach discussed in detail in the literature was used (see e.g., Siddiqi 2017, pp. 240–41). The relationship between the score and logarithms for the so-called odds ratio:

O d d s = \frac{1 - P D}{P D}

—expressing the ratio of odds: (1-PD)—that the business in question will be classified as healthy versus the odds that the business will be bankrupt (PD) is:

S c o r e = a_{0} + a_{1} \cdot \ln (O d d s) .

(1)

By introducing the concept of

p d o

—the number of points in the scoring system which doubles the value of the odds ratio, for a given value of the score we obtain the following relationship:

S c o r e + p d o = a_{0} + a_{1} \cdot \ln (2 \cdot O d d s) .

(2)

By solving the system of Equations (1) and (2) we obtain formulas of the linear relationship ratios of score scaling depending on ln(Odds), and therefore on the probability of default (PD):

\begin{matrix} a_{1} = \frac{p d o}{\ln (2)}, \\ a_{0} = S c o r e_{0} - a_{1} \cdot \ln (O d d s) . \end{matrix}

(3)

4.5. Validation Measures of Bankruptcy Prediction Models

Commonly used measures of classification accuracy were applied in the validation of estimated bankruptcy models. They are described by Siddiqi (2017) and Thomas (2009) clearly and in detail. The confusion matrix is probably the most frequent approach in the assessment of classification accuracy of models. Table 2 presents a general form of the confusion matrix.

Quantities shown in the table have the following meaning: TN—number of actually bankrupt businesses correctly classified by the model, TP—number of healthy businesses correctly classified by the model as healthy businesses, FN—number of actually bankrupt businesses incorrectly classified by the model as healthy businesses, FP—number of actually healthy businesses incorrectly classified by the model as bankrupt.

A C = \frac{T N + T P}{T N + F N + F P + T P} \cdot 100 %

is the measure of the overall effectiveness of correct classification. The effectiveness of correct classifications for the ‘bankrupt’ class alone can be specified as:

A C_{B} = \frac{T N}{T N + F N} \cdot 100 % = 1 - E r r_{B}

, where:

E r r_{B}

—is the so-called type I error of incorrect classifications for the class of bankrupt businesses. Likewise, the effectiveness of correct classification for the businesses at no risk of bankruptcy alone can be determined as follows:

A C_{N B} = \frac{T P}{F P + T P} \cdot 100 % = 1 - E r r_{N B}

, where:

E r r_{N B}

—is the so-called type II error of incorrect classification for the class of healthy businesses. Obviously, the higher the values of classification accuracy measures, the better the effectiveness of the models assessed.

The GINI coefficient and the related area under curve ROC (Receiver Operating Characteristic) AUC_ROC are also often used as measures of bankruptcy model classification effectiveness (see e.g., Agarwal and Taffler 2008, Barboza et al. 2017). The ROC curve is a graphic representation in a coordinate system (Y = Sensitivity, X = (1 − Specificity)) of a relationship of the cumulative percentage (structural ratio) for bankrupt businesses from the contingency table for the predicted ith category of a point score (

s c o r e_{i}

):

ω_s k_{B, i} = \frac{\sum_{j = 1}^{i} n_{B, j}}{n_{B}}

and the corresponding cumulative structural ratio for businesses at no risk of default:

ω_s k_{N B, i} = \frac{\sum_{j = 1}^{i} n_{N B, j}}{n_{N B}}

. In the case of classification results ordered relative to the score in the contingency table with k different scoring categories, the GINI coefficient, and thus AUC_ROC, is determined by the following formula (see e.g., Thomas 2009, pp. 117–18):

G I N I = 1 - \sum_{i = 1}^{k - 1} (ω_{s k}_{B, i + 1} - ω_{s k}_{B, i}) \cdot (ω_{s k}_{N B, i + 1} + ω_{s k}_{N B, i}) = 2 \cdot A U C (R O C) - 1 .

(4)

The GINI coefficient takes values from interval [0,1]. High values of the coefficient, approaching 1, mean that the model being assessed is highly effective (nearly perfect). Meanwhile, the measure of the area under curve AUC_ROC ranges from 0.5 to 1. Value 0.5 means that the model classifies businesses in the analyzed classes in a completely random way (i.e., its use is pointless), while 1 is a value attained by the best model which perfectly identifies membership in a class.

Information Value (IV), Kolmogorov-Smirnov (KS) statistics and less frequently, the divergence coefficient (Div) are also used to evaluate the effectiveness of bankruptcy forecasting models at the validation stage. IV is calculated by the following formula (see e.g., Thomas 2009, p. 106):

I V = \sum_{i = 1}^{k} (\frac{n_{N B, i}}{n_{N B}} - \frac{n_{B, i}}{n_{B}}) \cdot \ln (\frac{n_{N B, i} / n_{N B}}{n_{B, i} / n_{B}})

(5)

where:

n_{B}

is the number of bankrupt businesses,

n_{N B}

is the number of businesses with no risk of bankruptcy,

n_{B, i}

is the number of businesses for the ith scoring category and

n_{N B, i}

is the corresponding number of businesses with no risk of bankruptcy. The higher IV values, the better discriminant properties of the model subjected to assessment.

The Kolmogorov-Smirnov (KS) statistic compares the empirical distributions of populations containing bankrupt businesses and healthy businesses (a goodness of fit measure). The greater the differences in cumulative distribution functions for the score (higher KS values), the better discriminant capabilities of the model (i.e., the better the scoring model is in separating bankrupt businesses from healthy ones). KS statistic values are calculated by the following formula (see e.g., Thomas 2009, p. 111):

K S = \underset{i = 1, \dots, k}{m a x} | ω_s k_{B, i} - ω_s k_{N B, i} | .

(6)

The last validation measure applied to the bankruptcy forecasting models assessed is distribution divergence (Div) given by the formula (see e.g., Siddiqi 2017, p. 261):

D i v = \frac{{(μ_{N B} - μ_{B})}^{2}}{0.5 \cdot (v a r_{N B} + v a r_{B})}

(7)

where:

μ_{N B}

—mean score distribution value for the healthy businesses population,

μ_{B}

—mean score distribution value for bankrupt businesses population and

v a r_{N B}, v a r_{B}

—respective variances of these distributions.

4.6. Optimal Cut-Off Point for Scoring Determination

There are several methods of determination the optimal cut-off point for the scoring models. These methods are described in depth in the literature (see e.g., Zweig and Campbell 1993). One of the methods of determining the optimum cut-off point for the score (used in the research) was to find a score value that maximizes the value of the following expression:

\max_{s c o r e_{i}} {M_{1} (s c o r e_{i}) = ω_s k_{B, i} (s c o r e_{i}) - \frac{k_{N B}}{k_{B}} \cdot \frac{1 - p_{B}}{p_{B}} \cdot ω_s k_{N B, i} (s c o r e_{i})}

(8)

where:

k_{B}

is the cost of type I error: the model incorrectly classifies a bankrupt business as a healthy one,

k_{N B}

corresponds to the cost of type II error where the model incorrectly classifies a healthy business as bankrupt, and

p_{B}

is the probability of membership in the bankrupt class estimated on the basis of the training sample (the percentage of bankrupt businesses in the sample).

5. Research Results

The ensemble classifier methodology will be applied to design a scoring model in order to predict bankruptcy events of Polish businesses operating in the Podkarpackie Voivodeship. Each stage of design will be presented in detail together with its potential for a practical application.

The process of designing a scoring model using ensemble classifiers for businesses operating in the Podkarpackie Voivodeship was divided into several stages:

The choice of a suitable subset of financial ratios (bankruptcy predictors) determining the financial circumstances of the businesses analyzed (feature selection stage).
Training and calibration of base models applied and ensemble models selected on the basis of the training sample. Determining the function of the probability of default and membership in forecast classes for both samples: training sample, and test and validation sample (which is not taken into account at the stage of calibration of the evaluated models).
Determining the score value for the training sample and the test sample with a suitable scaling of the value of the resulting probability of default function for the estimated models and their transformation into corresponding resulting point score values.
Validation of estimated models. Determining the values of validation statistics for the models applied and analysis of their discriminant capabilities for the training sample and the test sample. Selection of the best forecasting model.
For the best model, determining the optimum cut-off point for the score value, i.e., the point below which a business should be categorized as bankrupt.
Bankruptcy forecasts for analyzed businesses from the Podkarpackie Voivodeship in individual sectors of economic activity and business size. Final comparative analysis of results and final conclusions.

5.1. Feature Selection Stage—Selection of Ratios/Bankruptcy Risk Determinants

Twenty-two financial ratios commonly applied in financial analysis of businesses were initially proposed for the assessment of the financial standing of analyzed business entities:

Financial liquidity ratios: X1—Current ratio = Current assets to Short-term liabilities total (all liabilities with maturity shorter than one year): CA/STL, X2—Quick ratio = (Current assets − Inventories) to Short-term liabilities total: (CA-I)/STL, X3—Cash ratio = Cash and Cash equivalents to Short-term liabilities total: Cash/STL
Profitability ratios: X4—Operating profit margin = Operating earnings to Net sales: OE/NS [%], X5—Return on assets (ROA) = Net profit (Total Revenue − Cost of Goods Sold − Operating Expenses − Other Expenses − Interest and Taxes) to Assets total (Balance sheet total): NP/BST [%], X6—Return on equity (ROE) = Net profit to Equity: NP/E [%], X7—Return on invested capital = Net profit to (Assets total − Short-term liabilities total): NP/(BST-STL) [%], X8—Net profitability = Net profit to Revenues from sales: NP/RS [%], X9—gross profit margin on sales = (Revenues from sales − Cost of goods sold) to Revenues from sales: (RS-CoGS)/RS [%], X10—operating return on assets = EBIT (Earnings Before Interest and Taxes) to Assets total: EBIT/BST [%]
Debt ratios: X11—Overall debt = Liabilities total to Assets total (Balance sheet total): TL/BST [%], X12—Debt to equity = Liabilities total to Equity: (TL/E) [%], X13—Debt to EBITDA = Liabilities total to EBITDA: TL/EBITDA, X14—Financial leverage = Assets total (Balance sheet total) to Equity: BST/E [%]
Management effectiveness ratios: X15—Receivable turnover = Revenues from sales to Short-term receivables: RS/STR, X16—Asset turnover = Revenues from sales to Assets total (Balance sheet total): RS/BST, X17—Inventory turnover = Revenues from sales to Inventories: RS/I, X18—Liability turnover = (Revenues from sales + Inventories) to Short-term liabilities total: (RS+I)/STL, X19—Working capital turnover = Revenues from sales to (Current assets − Short-term liabilities total): RS/(CA-STL)
Capital structure ratios: X20—Structure of Equity to Assets total (Balance sheet total): E/BST [%], X21—Structure of Fixed assets to total assets (Balance sheet total): FA/BST [%], X22—Structure of Fixed assets to Current assets: FA/CA [%]

With the help of wrapper techniques (discussed in Section 4.2 above), an optimum subset of predictors was selected by means of the genetic algorithm and a potentially best set of financial ratios for bankruptcy forecasting models being trained was determined. Linear discriminant analysis (LDA) was used as a forecasting model in the search algorithm, while the general classification accuracy (AC) measure was applied as the measure of the effectiveness of predictor subsets. The calculations were performed by means of the R statistical analyses package and function gafs() from the caret library. Parameters for the genetic algorithm were as follows: poSize = 50—the number of subsets assessed in each iteration, pcrossover = 0.8 (crossover probability) —a high probability that the new generation will not be an exact copy of the chromosomes of parents from the previous generation, pmutation = 0.1 (mutation probability)—a low probability of chromosome alterations in the subsequent mutation, elite = 0—the number of best subsets capable of survival in each generation. By means of a suitable genetic algorithm randomly searching for the best subset of diagnostic variables, a set of 19 optimum financial ratios (accuracy for the set was AC = 0.89) using 5-fold cross-validation (cv) procedure. Table 3 contains values of selected measures of discriminant capabilities and significance for individual diagnostic variables.

5.2. Calibration of the Parameters of Bankruptcy Risk Forecast Models (Calibration Stage)

Eight single classifier models were used in forecasting the probability of default (PD) (Table 3). Classification functions for those models, the so-called level 1 classifiers, served as inputs for a level 2 ensemble meta-classifier, which aggregated them into final classification results. k-NN (k-Nearest Neighbors) was the stacking ensemble classifier. Alternatively, boosting and bagging ensemble classifier approaches were also applied. For comparison purposes, boosting ensemble classifiers were also used: GBM—Stochastic Gradient Boosting Machine (Friedman 2002) and boosted logistic regression classifier (Logit Boost). The Random Forest (RF) model and averaged Neural Networks (avNNet) were used as bagging classifiers (Breiman 2001). A bankruptcy prediction model calibration procedure was based on samples described in detail in Section 4.3. Calculations were performed with the help of procedures written with the use of the R package libraries (https://cran.r-project.org/). In particular, the following libraries were used: caret, caretEnsemble, caTools, pROC, MASS, nnet, kernlab, rpart, earth, mgCV, klaR, gbm, plyr, randomForest and other auxiliary ones. A cross validation approach was employed in the calibration procedure of the optimum model (k = 5-fold CV cross-validation). The approach assumed an area under ROC curve values (AUC_ROC) as a measure of models’ discriminant quality (effectiveness). Figure 8 illustrates the process of increasing classification effectiveness for the boosting ensemble model depending on the number of iterations of the boosting algorithm for various complexity of classification trees trained. It very clearly shows why ensemble classifiers surpass single (individual) classifiers in terms of quality.

A table in Appendix A (Table A1) presents the final best configurations of the considered bankruptcy prediction models and optimum values of their parameters.

5.3. Determining Score for the Optimum Model (Score Scaling Stage)

Forecast values of classification functions of the models analyzed (probability of default, PD) in the scoring model should be transformed into corresponding values of score through appropriate scaling. In the calculations, it was assumed that for

S c o r e_{0} = 600

the number points which doubles the odds that the business is not at risk of default, evaluated as 50:1 (Odds = 50), is pdo = 20. With the above assumptions, scaling parameters were estimated and the score function was described by the following relationship:

S c o r e = 487.12 + 28.85 \cdot \ln (\frac{1 - P D}{P D})

. Figure 9 illustrates the scaling obtained for the score when the GBM ensemble model is used for the training sample.

5.4. Model Validation (validation Stage)

Figure 10 presents ROC curves for five classification models assessed. It is clear that the GBM model perfectly (in 100% cases) predicted membership of businesses in either class (bankrupt and healthy) (AUC = 1). The worst of the models compared, NB—Naive Bayes, also had high prediction accuracy expressed by measure (AUC = 0.92), although it was still significantly inferior to other models.

Figure 11 presents a graphic interpretation of KS = 0.89 (for score = 468) for the LDA model when testing its effectiveness with regard to the test and validation sample. High values for this KS statistics mean that the model is rather effective.

Figure 12 presents a comparison and interpretation of a very high discriminant capability of the ensemble GBM model (divergence Div = 92.1) and the LDA model with a relatively weaker discriminant capability (divergence Div = 2.6) rated on the basis of the training sample.

5.5. Optimal Cut_Off Point Determination Stage

The next step for the ensemble GBM classifier-based forecasting model with the best classification properties expressed by the value of validation measures involved determining values of the optimum cut_off point below which the businesses analyzed were regarded as being at risk of default (bankrupt). In the calculations, it was assumed that the ratio of the above costs is

\frac{k_{NB}}{k_{B}} = \frac{1}{2}

(double cost for the incorrect classification of bankrupts, as the event appears to be more detrimental for the practical application of the model) and a probability of

p_{B} = 0.486

in the training sample was determined. The optimum cut_off point was calculated for

{score}_{cut_off} = 386

by means of formula (8). Therefore, all businesses for which the point value of the score is

score \leq 386

must be forecast as members of the bankruptcy (B) class, while the remaining ones as members of the non-bankruptcy (NB) class. Still, for the estimated optimum ensemble GBM model in the score value interval [387–486], there is a very high potential risk of default (PD > 0.5), determined on the basis of the training sample (contained in the interval [0.96–0.51]). Consequently, if we rely on the classical procedure allowing us to consider a business (for which PD > 0.5) bankrupt (at risk of default), then the score interval (387 <= score <= 486) should be defined as a “gray zone”, where it is difficult to clearly determine the membership of a given business in either the bankruptcy class or the non-bankruptcy class. Businesses of this type were assessed as uncertain, leaning towards potential bankruptcy (contingent on unfavorable circumstances affecting their financial health).

Figure 13 presents an interpretation of the optimum cut-off point for the score, determined in the above manner.

5.6. Classification of Enterprises from the Podkarpacie Region (Prediction Stage) Depending on the Risk of Their Bankruptcy

Applying the classification rule:

\begin{matrix} IF (s c o r e \leq 386) THEN bankrupt within h \leq 2 years; \\ IF (s c o r e > 486) THEN healthy; \\ IF (s c o r e > 386 AND s c o r e \leq 486) THEN uncertain (grey zone); \end{matrix}

(9)

a forecast of bankruptcy (membership in either risk class) was determined over a time horizon of maximum 2 years (up to 2020) for businesses operating in the Podkarpackie Voivodeship in various sectors of economic activity and depending on the enterprise size. Table 4 is a contingency table presenting the forecast number of businesses classified as members of each of the 3 bankruptcy risk classes by different economic activity sectors.

Figure 14 presents the forecast probability of potential bankruptcy risk (up to h = 2 years) for the enterprises surveyed from the Podkarpacie for various sectors of classification of their activities, which were estimated on the basis of the optimal ensemble model (GBM) for which the classification functions were used in the developed scoring model.

Figure 15, on the other hand, shows the predicted values of such probability of bankruptcy for the surveyed enterprises from the Podkarpackie, depending on their enterprise size.

Table 5 presents a proper assessment of the classification effectiveness of the developed bankruptcy early warning model on observed and available at the time of conducting the confirmed court tests of 39 actual enterprises that declared bankruptcies in the Podkarpackie Voivodeship (in 2019). They were included in the test sample of 2133 enterprises. This confirms the fairly good quality of the model for which the effectiveness (ex-post) of correct recognition by the implemented scoring model for new (not taken into account at the calibration stage) of actually bankrupt enterprises is about 79% (which seems to be acceptable result), while for enterprises not threatened with bankruptcy, the efficiency of the model is much better and is equal to 95%.

6. Discussion

The comparative analysis of the classification effectiveness of ensemble models in juxtaposition with several classical bankruptcy forecasting methods indicates that ensemble classifiers are characterized by considerably better values of validation measures, both for the training sample and the test sample, surpassing all of the analyzed base classifiers in terms of accuracy. The best ensemble classifier, GBM (decision trees supported by a stochastic gradient boosting algorithm) offered full accuracy of correctly classified bankrupt and healthy businesses (AC = 100%, AC_B = 100%, AC_NB = 100%) for the training sample and over 99% for the test sample (Table A2 and Table A3). In addition, other values of validation statistics demonstrate nearly perfect predictive capability of the GBM ensemble model for the training sample: AUC_ROC = 1, statistic KS = 1, divergence Div = 92.1 and information value IV = 5.3 and the test sample: AUC_ROC = 0.99, statistic KS = 0.99, divergence Div = 22.1 and information value IV = 7.1. The Generalized Additive Model (GAM) seems to be the best classical model, yet it displays inferior values of validation statistics, both for the training sample: AC = 97, AUC_ROC = 0.99, KS = 0.96, Div = 5.8, IV = 5.3, and for the test sample: AC = 97%, AUC_ROC = 0.99, KS = 0.96, divergence Div = 43.0, IV = 7.1. This confirms the earlier findings of other authors and allows us to say that in practical applications, bankruptcy models based on ensemble classifiers outperform other classical approaches and are an interesting alternative to the conventional method of using single classifiers.

Based on the analysis of the value of the probability of bankruptcy (Figure 14) of the enterprises surveyed in the Podkarpackie Voivodeship in individual sectors of their business activity (estimated on the basis of the best ensemble classifier model—GBM, which has the best forecasting and classification capabilities) and on the basis of an analysis of their predicted belonging to three Bankruptcy risk classes (Table 4), the following comparative analysis can be carried out assessing the exposure to bankruptcy risk of enterprises operating in the region in 2018 in view of their potential bankruptcy by 2020.

In sector A (farming, forestry and fishing) with a total of 50 enterprises surveyed, the developed scoring model predicted bankruptcy within a time horizon of up to two years (up to 2020) 4% of all enterprises in this sector, including uncertain enterprises from the second class of bankruptcy risk (from the so-called “gray zone”), i.e., with a significant probability of bankruptcy (PDt = 2 > 50%), the percentage of potentially bankrupt enterprises (over a 2-year horizon) increases to 10%. The average probability of bankruptcy for enterprises in this sector is 11% (min = 0%, max = 99.9%). Every 10 enterprise in this sector had a probability of bankruptcy over a 2-year horizon (up to 2020) in the range of 43%–99.9%. It is therefore quite heavily exposed to the risk of bankruptcy.

In sector B (mining and extraction) with a total of 12 enterprises, the scoring model qualified all enterprises as not being threatened with bankruptcy. The average probability of bankruptcy for enterprises in this sector is 1% (min = 0%, max = 6.8%). Every one of the 10 enterprises in this sector had a probability of bankruptcy in the 2-year horizon (up to 2020) in the range of 2.3%–6.8%. Therefore, it was the first of the three least risky sectors of the region’s economy.

In sector C (industrial processing) with a total of 581 enterprises, the scoring model predicted bankruptcy for 2% of all enterprises in this sector within a time horizon of up to two years (up to 2020), including uncertain enterprises from the second class of bankruptcy risk (from “grey zone”), while the number of potentially bankrupt enterprises increased to 7%. The average probability of bankruptcy for enterprises in this sector is 7.4% (min = 0%, max = 100%). Every enterprise in this sector had a probability of bankruptcy over a 2-year horizon (up to 2020) greater than 17.7%.

Sector D (energy, water, gas and other energy sources) with a total of 25 enterprises was the second of the three least risky sectors in the region’s economy. The scoring model qualified all enterprises as not being threatened with bankruptcy. The average probability of bankruptcy for enterprises in this sector is 2.2% (min = 0%, max = 43.3%). Every 10 enterprises in this sector had a probability of bankruptcy in the 2-year horizon (up to 2020) that was greater than 1.7%.

In sector E (waste, wastewater and sewage management) with a total of 67 enterprises, the scoring model qualified 97% of enterprises as not being threatened with bankruptcy, and 3% as uncertain. The average probability of bankruptcy for enterprises in this sector is 3.8% (min = 0%, max = 83.7%). Every 10 enterprises in this sector had a probability of bankruptcy in the 2-year horizon (up to 2020) that was greater than 9.2%.

In F sector (construction) with a total of 220 enterprises, the scoring model predicted bankruptcy within a time horizon of up to two years (up to 2020) for 3% of all enterprises in this sector, though after including uncertain enterprises with the second class of bankruptcy risk (from the “grey zone”), the percentage of potentially bankrupt enterprises increases to 8%. The average probability of bankruptcy for enterprises in this sector is 8.6% (min = 0%, max = 100%). Every 10 enterprises in this sector had a probability of bankruptcy over a 2-year horizon (up to 2020) that was greater than 23.5%.

In sector G (wholesale and retail) with a total of 734 enterprises, the scoring model predicted bankruptcy for 2% of all enterprises in this sector for up to two years (up to 2020). After including uncertain enterprises from the second class of bankruptcy risk (from the “gray zone”), the percentage of potentially bankrupt enterprises rose to 5%. The average probability of bankruptcy for enterprises in this sector is 5.7% (min = 0%, max = 100%). Every 10 enterprises in this sector had a probability of bankruptcy over a 2-year horizon (up to 2020) that was greater than 9.8%.

In the H (transport and storage management) sector with a total of 75 enterprises, the scoring model predicted bankruptcy for 3% of all enterprises in this sector for up to two years (up to 2020), including uncertain enterprises from the second class of bankruptcy risk (from the “gray zone”), while the percentage of potentially bankrupt enterprises increased to 7%. The average probability of bankruptcy for enterprises in this sector is 8.2% (min = 0%, max = 100%). Every 10 enterprises in this sector had a probability of bankruptcy over a 2-year horizon (up to 2020) that was greater than 23.2%.

The I sector (accommodation and gastronomy) with a total of 56 enterprises was the sector most exposed to the risk of bankruptcy. The scoring model predicts bankruptcy in the time horizon of up to two years (up to 2020) for as much as 13% of all enterprises in this sector, including uncertain enterprises in the second class of bankruptcy risk (from the “gray zone”), meaning the percentage of potentially bankrupt enterprises increased to 20%. The average probability of bankruptcy for enterprises in this sector is 22.2% (min = 0%, max = 100%). Every 10 enterprises in this sector had a probability of bankruptcy over a 2-year horizon (up to 2020) that was greater than 98.6%.

In the J (information and communication) sector with a total of 55 enterprises, the scoring model qualified 95% of enterprises as not being threatened with bankruptcy, and 5% as uncertain. The average probability of bankruptcy for enterprises in this sector is 6.1% (min = 0%, max = 89.4%). Every 10 enterprises in this sector had a probability of bankruptcy in the 2-year horizon (up to 2020) greater than 11.6%.

In the K (finance and insurance) sector with a total of 12 enterprises, the scoring model qualified 67% of enterprises as not being threatened with bankruptcy, and 33% as uncertain. The average probability of bankruptcy for enterprises in this sector is 28.2% (min = 0%, max = 96%). Every 10 enterprises in this sector had a probability of bankruptcy in a 2-year horizon (up to 2020) within 95.2–96%. This is a very specific sector (financial sector), hence the ambiguous interpretation of the results of the examined model belonging to risk classes.

In the L sector (services for the property market) with a total of 73 enterprises, the scoring model predicted bankruptcy for 3% of all enterprises in this sector within a 2-year horizon (up to 2020), including uncertain enterprises from the second class of bankruptcy risk (from the “gray zone”), where the percentage of potentially bankrupt enterprises increases to 17%. The average probability of bankruptcy for enterprises in this sector is 15.8% (min = 0%, max = 99.7%). Every 10 enterprises in this sector had a probability of bankruptcy over a 2-year horizon (up to 2020) in the range of 72.1%–99.7%. It is therefore also one of the sectors with high exposure to the risk of bankruptcy.

In the sector M (scientific, specialist and technological activity) with a total of 61 enterprises, the scoring model predicted bankruptcy for 2% of all enterprises in this sector within a time horizon of up to 2 years (up to 2020). After including uncertain enterprises from the second class of bankruptcy risk (from the “gray zone”), the percentage of potentially bankrupt enterprises increased to 5%. The average probability of bankruptcy for enterprises in this sector is 6.9% (min = 0%, max = 98.1%). Every 10 enterprises in this sector had a probability of bankruptcy over a 2-year horizon (up to 2020) greater than 17.4%.

The N sector (administration and support) with a total of 43 enterprises was also one of the sectors with a high exposure to the risk of bankruptcy. The scoring model predicted bankruptcy within a 2-year horizon (up to 2020) for 5% of all enterprises in this sector, including uncertain enterprises from the second class of bankruptcy risk (from the “gray zone”), when the percentage of potentially bankrupt enterprises increases to 12%. The average probability of bankruptcy for enterprises in this sector is 10.3% (min = 0%, max = 99.8%). Every 10 enterprises in this sector had a probability of bankruptcy over a 2-year horizon (up to 2020) greater than 51%.

Sector P (education) with a total of only nine enterprises was the third least risk affected sectors in the region’s economy. The scoring model qualified all enterprises as not threatened with bankruptcy. The average probability of bankruptcy for enterprises in this sector is 3% (min = 0%, max = 19.1%). Every 10 enterprises in this sector had a probability of bankruptcy within a 2-year horizon (up to 2020) greater than 19%.

In the Q (health and social care) sector with a total of 38 enterprises, the scoring model predicted bankruptcy within a 2-year horizon (up to 2020) for 3% of all enterprises in this sector, including uncertain enterprises from the second class of bankruptcy risk (from “gray zone”), when the percentage of potentially bankrupt enterprises increases to 8%. The average probability of bankruptcy for enterprises in this sector is 10.3% (min = 0%, max = 98.8%). Every 10 enterprise in this sector had a probability of bankruptcy over a 2-year horizon (up to 2020) greater than 39%.

In the R (entertainment and leisure) sector with a total of 11 enterprises, the scoring model qualified 82% of enterprises as not being threatened with bankruptcy, and much because 18% as uncertain. The average probability of bankruptcy for enterprises in this sector is 11.3% (min = 0%, max = 61%). Every 10 enterprises in this sector had a probability of bankruptcy in the 2-year horizon (up to 2020) in the range of 54.9%–61%. Therefore, it is a sector in which ambiguity in the interpretation of the results of the examined model to risk classes can also be observed.

In the last sector S (other services) with a total of 11 enterprises, the scoring model qualified 91% of enterprises as not threatened with bankruptcy, and 9% as uncertain. The average probability of bankruptcy for enterprises in this sector is 12% (min = 0%, max = 91.7%). Every 10 enterprises in this sector had a probability of bankruptcy in the 2-year horizon (up to 2020) greater than 17.7%. It is also a sector in which ambiguity can be observed in interpreting the belonging of the results of the examined model to risk classes.

Based on the results from Table 4 and based on the analysis of the value of the probable bankruptcy probability (Figure 15) for the surveyed enterprises depending on their size, the following relationships illustrating the degree of their exposure to the risk of bankruptcy can be seen. In the sector for very small (micro) enterprises (535 of which were included in the study), the developed scoring model qualified 89% of these enterprises as not threatened with bankruptcy, 4% as bankrupt and a further 7% as uncertain (from the “gray zone”), but potentially with a significant risk of their bankruptcy above 50%. In the sector of small sized enterprises, of which 356 was developed in the study, the scoring model qualified 91% of such enterprises as not threatened with bankruptcy, 3% as bankrupt and another 6% as uncertain (from the “gray zone”). In the sector of medium enterprises (606 included in the study), the scoring model qualified 96% of enterprises as not threatened with bankruptcy, 2% as bankrupt and another 2% as uncertain (from the “gray zone”). Similarly for the large enterprise sector (636 enterprises) the scoring model in the study classified 94% of enterprises as not at risk of bankruptcy, 2% as bankrupt and another 4% as uncertain (from the “gray zone”).

One also should pay attention to limitations of the analyses presented. The limitation of the model developed may be the fact that the developed and implemented scoring model has been estimated on the basis of statistical data for enterprises from various sectors of activity. It is very difficult to develop a model with good accuracy (a sufficiently high classification efficiency) that would be good in such a situation, since various sectors often very specific and incomparable. However, on the other hand, the results obtained (Table 5) for 39 actual bankruptcies of enterprises in the Podkarpackie Voivodeship observed and confirmed in 2018, the efficiency of correct recognition by the scoring model of really bankrupt enterprises is about 79%, while for non-bankrupt enterprises the equivalent figure is 95%. The effectiveness of the scoring model for the separate class: bankrupt at 79% is sufficient and acceptable, but of course can be discussed further. It can show that designed model includes three classes of bankruptcy risk (bankrupt, non-bankrupt and “gray zone”—difficult to say, but potentially also bankrupt). In the classic approach with only two classes (bankrupt, non-bankrupt), one should add another 8% to the model effectiveness (including the class of uncertain enterprises—“gray zone” for which the probability of bankruptcy is high and greater than 0.5). Then the efficiency of the correct classifications of estimated model increases to 87%, which seems to be a good result. Overall accuracy for the model (without division into classes) is 94%.

Also, the selection of such a large set of as many as 19 indicators as determinants of the financial condition of enterprises in the models raises the question of whether it should not be limited to the set of only a few most important indicators. Such a large collection may raise suspicions that many of the variables may be strongly correlated with each other, which may affect the quality, especially of classic models, such as LDA. In the study, such a large set of factors was conditioned by the choice using the wrapper method and genetic algorithm, and the final application of the type of ensemble classifiers that are not so sensitive to the interdependence of variables. However, for the sake of accuracy, it is worth emphasizing that the correlation between variables has never been greater than 0.87. However, in future research, it is worth considering reducing the number of predictors of bankruptcy.

7. Conclusions

The results of the analyses presented in the paper lead to several general conclusions that can be a summary of the research:

The scoring model designed for the early prediction of bankruptcy risk for Polish businesses from the Podkarpackie Voivodeship using ensemble classifiers was highly effective in forecasting and accurately evaluating the risk of default of the analyzed businesses.
An analysis of the forecast is obtained suggests that small enterprises are more exposed to risk of default than medium or large enterprises.
The sector of business activity and unique characteristics of the economic activity influences a potentially higher risk of business bankruptcy. A higher number of potential bankruptcies is reported in some sectors of economic activity than in others.
A higher risk of business bankruptcy for some particular industry branches may be caused the situation where bankruptcy models are sensitive to enterprises belonging to industry sectors. This can be considered as one of the limitations of the study presented in the paper. A potentially higher risk of business bankruptcy for some particular industry branches can be influenced by the model design. It would have to be examined in further research whether the estimated separate models for each sector would indicate lower values of PD and therefore lower exposure to the risk of bankruptcy of companies.
Another limitation of the study is that bankruptcy models are sensitive to the phase of economic cycle (presented model does not cover it), but the influence of economic cycles on bankruptcy risk can be considered in further extensions of research.
The approach presented in the paper can be used not only to assess the risk of bankruptcy of enterprises by market analysts and regional analysts, but also in banking activities to assess credit risk for corporate loans, where similar models are of course successfully implemented.
The study may be extended in the future with an analysis and an assessment of the risk of bankruptcy for enterprises from other regions of Poland with the development of individual separate ensemble models for enterprises from key sectors of the country’s economy. It can also be extended to a comparative analysis of the risk of bankruptcy in given sectors of the economy for a group of countries, e.g., EU, Visegrad Group countries or the Three Seas Initiative countries.

Funding

This research received no external funding.

Conflicts of Interest

The author declares no conflicts of interest.

Appendix A

Table A1. Optimum configuration and set of parameters for bankruptcy models applied.

Classification Model Applied	Optimum Model Configuration (Training Sample) Parameter Selection Criterion: AUC (ROC) Sampling: (k = 5 fold) Cross-Validation
Classification Model Applied	Model Parameters:
Individual (single) classifier
LDA (M1) linear discriminant analysis model $L D A = α_{1} Z_{1} + \dots + α_{m} Z_{m}$	$α_{1} = - 8.3 * 10^{- 3}$ , $α_{2} = 6.6 * 10^{- 3}$ , $α_{3} = - 8.1 * 10^{- 5}$ $α_{4} = 5.0 * 10^{- 3}$ , $α_{5} = - 2.1 * 10^{- 5}$ , $α_{6} = 1.1 * 10^{- 6}$ $α_{7} = 8.3 * 10^{- 5}$ , $α_{8} = 2.7 * 10^{- 4}$ , $α_{9} = - 4.1 * 10^{- 3}$ $α_{10} = - 3.7 * 10^{- 2}$ , $α_{11} = - 1.8 * 10^{- 5}$ , $α_{12} = 3.2 * 10^{- 5}$ $α_{13} = 2.3 * 10^{- 5}$ , $α_{14} = - 2.7 * 10^{- 2}$ , $α_{15} = - 1.3 * 10^{- 8}$ $α_{16} = 9.8 * 10^{- 3}$ , $α_{17} = - 1.5 * 10^{- 6}$ , $α_{18} = - 3.7 * 10^{- 2}$ , $α_{19} = - 6.6 * 10^{- 8}$
LOGIT (M2) logistic regression model $L = L O G I T = l n (\frac{p}{1 - p}) = α_{0} + α_{1} Z_{1} + \dots + α_{m} Z_{m}$	$α_{0} = 1.0 * 10^{15}$ , $α_{1} = - 6.9 * 10^{12}$ , $α_{2} = 2.1 * 10^{12}$ $α_{3} = - 5.1 * 10^{11}, α_{4} = 6.9 * 10^{12}, α_{5} = - 1.3 * 10^{10}$ $α_{6} = 4.0 * 10^{8}$ , $α_{7} = 4.9 * 10^{11}$ , $α_{8} = 9.2 * 10^{11}$ $α_{9} = - 3.7 * 10^{12}$ , $α_{10} = - 1.4 * 10^{13}$ , $α_{11} = - 3.7 * 10^{9}$ $α_{12} = 9.0 * 10^{10}$ , $α_{13} = - 6.5 * 10^{9}$ , $α_{14} = 2.2 * 10^{13}$ $α_{15} = 1.8 * 10^{6}$ , $α_{16} = 4.4 * 10^{12}$ , $α_{17} = - 4.1 * 10^{10}$ $α_{18} = - 1.1 * 10^{13}$ , $α_{19} = - 2.2 * 10^{8}$
NNet (M3) neural network (single hidden layer network)	Network configuration: 19-5-1 Neuron activation function: logistic Error function = entropy fitting Calibrated parameter for weights: decay = 0.1
SVM Radial (M4) Support Vector Machine	Cost parameter: C = 1 Hyper parameter: sigma = 11.969
C&RT (M5) classification tree model	Tree complexity parameter (cp = 0.037) Tree split: $X_{11} \geq 40.79$ (class: bankrupt) $X_{11} < 40.79$ (class: no risk of bankruptcy)
MARS splines (M6)	product degree = 1 (degree of interaction); nprune = 12 (number of base functions);
Generalized Additive Model (GAM—M7)	Select = TRUE (feature selection); Link Function = Logit; Method = GCV.Cp (GCV method for an unknown parameter of model complexity)
Naive Bayes (M8)	Laplace Correction fL = 0 Distribution type usekernel = FALSE (Binomial) Bandwidth adjustment adjust = 1
Ensemble meta-classifier (stacking)
k-NN k-nearest neighbours, inputs: classification functions for base models (M1-M8)	Nearest neighbour parameter k = 9
Ensemble classifier (boosting)
Stochastic Gradient Boosting Machine (GBM)	Shrinkage = 0.2; n.minobsinnode = 15 (min. node size); n.trees = 130–boosting iterations interaction.depth = 5 (max. tree depth)
Boosted Logistic Regression (Logit Boost)	nIter = 13 (boosting iterations)
Ensemble classifier (bagging)
Random Forest (RF)	mtry = 5 randomly selected predictors ntree = 500 (number of trees)
Averaged NNet (avNNet)	bag = TRUE; n = 5—bootstraps; size = 5—number of neurons in the hidden layer for component networks; decay = 0.9—decay parameter for weights;

Source: own elaboration and calculations using R and Statistica software.

Table A2. Validation statistics for selected classical models of single bankruptcy classifiers in comparison to ensemble classifiers applied for the training sample.

Classification Model	Training Sample
Classification Model	AC	AC_B	AC_NB	AUC_ROC (GINI)	KS Statistics	Divergence (Div)	Information Value (IV)
Base classifiers
Linear discriminant analysis (LDA)—M1	88.4	94.6	82.5	0.96 (0.92)	0.87	2.6	5.2
Logistic regression (Logit)—M2	96.8	96.1	97.6	0.97 (0.94)	0.94	28.9	5.3
Neural network (NNet)—M3	93.0	94.1	92.0	0.95 (0.90)	0.86	7.5	5.2
Support Vector Machine (SVM Radial—M4)	96.4	95.4	97.4	0.99 (0.98)	0.93	17.2	5.2
Classification tree (C&RT)—M5	93.2	93.2	93.3	0.93 (0.86)	0.87	11.9	5.2
MARS splines—M6	96.0	95.8	96.3	0.99 (0.98)	0.94	8.0	5.2
Generalized Additive Model (GAM)—M7	97.7	98.0	97.4	0.99 (0.98)	0.96	5.8	5.3
Naive Bayes—M8	70.9	42.1	98.2	0.91 (0.82)	0.73	1.0	5.2
Ensemble classifier (stacking)
Meta-classifier ensemble: kNN—model results M1-M8 as inputs	97.3	97.1	97.4	0.99 (0.98)	0.96	23.0	5.3
Ensemble classifiers (boosting)
Stochastic Gradient Boosting Machine (GBM)	100	100	100	1.0 (1.0)	1.0	92.1	5.3
Logit Boost	97.9	97.3	98.5	0.99 (0.98)	0.96	20.5	5.3
Ensemble classifiers (bagging)
Random Forest (RF)	100	100	100	1.0 (1.0)	1.0	6.4	5.3
Averaged NNet (avNNet)	94.0	94.6	93.4	0.98 (0.96)	0.89	11.6	5.3

Source: own elaboration and calculations using R and Statistica software.

Table A3. Validation statistics for selected classical models of single bankruptcy classifiers in comparison to ensemble classifiers applied for the test/validation sample.

Classification Model	Test Sample
Classification Model	AC	AC_B	AC_NB	AUC_ROC (GINI)	KS Statistics	Divergence (Div)	Information Value (IV)
Base classifiers
Linear discriminant analysis (LDA)—M1	90.2	96.0	84.0	0.98 (0.96)	0.89	11.8	7.1
Logistic regression (Logit)—M2	96.5	94.5	98.8	0.97 (0.94)	0.93	46.2	7.1
Neural network (NNet)—M3	92.1	94.5	89.5	0.95 (0.90)	0.86	11.2	7.1
Support VectorMachine (SVM Radial)—M4	89.8	92.3	87.1	0.97 (0.94)	0.82	10.8	7.1
Classification tree (C&RT)—M5	94.2	95.2	93.1	0.94 (0.88)	0.88	14.8	7.1
MARS splines—M6	96.7	96.3	97.2	0.99 (0.98)	0.95	41.7	7.1
Generalized Additive Model (GAM)—M7	97.5	97.8	97.2	0.99 (0.98)	0.96	43.0	7.1
Naive Bayes—M8	68.4	41.7	97.6	0.93 (0.86)	0.78	7.8	7.1
Ensemble classifier (stacking)
Meta-classifier ensemble: kNN model result M1-M8 as inputs	98.1	97.8	98.4	0.99 (0.98)	0.97	22.2	7.1
Ensemble classifiers (boosting)
Stochastic Gradient Boosting Machine (GBM)	99.4	99.3	99.6	0.999 (0.998)	0.99	57.6	7.1
Logit Boost	98.5	98.2	99.6	0.99 (0.98)	0.98	20.6	7.1
Ensemble classifiers (bagging)
Random Forest (RF)	98.6	98.2	99.2	1.0 (1.0)	0.98	4.5	7.1
Averaged NNet (avNNet)	93.8	96.0	91.5	0.97 (0.94)	0.89	10.2	7.1

Source: own elaboration and calculations using R and Statistica software.

References

Achim, Monica V., Codruta Mare, and Sorin N. Borlea. 2012. A statistical model of financial risk bankruptcy applied for Romanian manufacturing industry. Procedia Economics and Finance 3: 132–37. [Google Scholar] [CrossRef][Green Version]
Agarwal, Vineet, and Richard Taffler. 2008. Comparing the performance of market-based and accounting-based bankruptcy prediction models. Journal of Banking & Finance 32: 1541–51. [Google Scholar]
Ala’raj, Maher, and Maysam F. Abbod. 2016. Classifiers consensus system approach for credit scoring. Knowledge-Based Systems 104: 89–105. [Google Scholar] [CrossRef]
Alaka, Hafiz A., Lukumon O. Oyedele, Hakeem A. Owolabi, Vikas Kumar, Saheed O. Ajayi, Olungbenga O. Akinade, and Muhammad Bilal. 2018. Systematic review of bankruptcy prediction models: Towards a framework for tool selection. Expert Systems with Applications 94: 164–84. [Google Scholar] [CrossRef]
Alfaro, Esteban, Noelia Garcia, Matias Gamez, and David Elizondo. 2008. Bankruptcy forecasting: An empirical comparison of AdaBoost and neural networks. Decision Support Systems 45: 110–22. [Google Scholar] [CrossRef]
Altman, Edward I. 1968. Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. Journal of Finance 23: 589–609. [Google Scholar] [CrossRef]
Anwar, Hina, Usman Qamar, and Abdul W. M. Qureshi. 2014. Global Optimization Ensemble Model for Classification Methods. The Scientific World Journal 2014: 1–9. [Google Scholar] [CrossRef]
Barboza, Flavio, Herbert Kimura, and Edward Altman. 2017. Machine learning models and bankruptcy prediction. Expert Systems with Applications 83: 405–17. [Google Scholar] [CrossRef]
Begley, Joy, Jin Ming, and Susan Watts. 1996. Bankruptcy classification errors in the 1980s: An empirical analysis of Altman’s and Ohlson’s models. Review of Accounting Studies 1: 267–84. [Google Scholar] [CrossRef]
Breiman, Leo. 1996. Bagging predictors. Machine Learning 24: 123–40. [Google Scholar] [CrossRef]
Breiman, Leo. 2001. Random Forests. Machine Learning 45: 5–32. [Google Scholar] [CrossRef]
Brown, Iain, and Christophe Mues. 2012. An experimental comparison of classification algorithms for imbalanced credit scoring data sets. Expert Systems with Applications 39: 3446–53. [Google Scholar] [CrossRef]
Bruneau, Catherine, Olivier de Brandt, and El A. Widad. 2012. Macroeconomic fluctuations and corporate financial fragility. Journal of Financial Stability 8: 219–35. [Google Scholar] [CrossRef]
Chen, Yibing, Lingling Zhang, and Liang Zhang. 2013. Financial Distress Prediction for Chinese Listed Manufacturing Companies. Procedia Computer Science 17: 678–86. [Google Scholar] [CrossRef][Green Version]
Chuang, Chun-Ling. 2013. Application of hybrid case-based reasoning for enhanced performance in bankruptcy prediction. Information Sciences 236: 174–85. [Google Scholar] [CrossRef]
Cortes, Esteban A., Matias G. Martinez, and Noelia G. Rubio. 2007. A boosting approach for corporate failure prediction. Applied Intelligence 27: 29–37. [Google Scholar] [CrossRef]
Diakomihalis, Mihail. 2012. The accuracy of Altman’s models in predicting hotel bankruptcy. International Journal of Accounting and Financial Reporting 2: 96–113. [Google Scholar] [CrossRef]
Du Jardin, Philippe. 2018. Failure pattern-based ensembles applied to bankruptcy forecasting. Decision Support Systems 107: 64–77. [Google Scholar] [CrossRef]
Emerging Markets Information Service (EMIS). 2019. EMIS Database. Available online: http://www.emis.com (accessed on 10 September 2019).
Fedorova, Elena, Evgenii Gilenko, and Sergey Dovzhenko. 2013. Bankruptcy prediction for Russian companies: Application of combined classifiers. Expert Systems with Applications 40: 7285–93. [Google Scholar] [CrossRef]
Freund, Yoav, and Robert E. Schapire. 1997. A decision theoretic generalization of online learning and an application to boosting. Journal of Computer and System Sciences 55: 119–39. [Google Scholar] [CrossRef]
Friedman, Jerome H. 2002. Stochastic gradient boosting. Computational Statistics and Data Analysis 38: 367–78. [Google Scholar] [CrossRef]
Główny Urząd Statystyczny (Statistics Poland). 2019. Bank Danych Lokalnych (Local Data Bank). Available online: https://bdl.stat.gov.pl (accessed on 10 September 2019).
Hadasik, Dorota. 1998. The Bankruptcy of Enterprises in Poland and Methods of Its Forecasting. Poznań: Wydawnictwo Akademii Ekonomicznej w Poznaniu. [Google Scholar]
Hamrol, Mirosław, and Jarosław Chodakowski. 2008. Prognozowanie zagrożenia finansowego przedsiębiorstwa. Wartość predykcyjna polskich modeli analizy dyskryminacyjnej. Badania Operacyjne i Decyzje 3: 17–32. [Google Scholar]
Hastie, Trevor, Robert Tibshirani, and Jerome Friedman. 2013. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. New York: Springer. [Google Scholar]
Heo, Junyoung, and Jin Y. Yang. 2014. AdaBoost based bankruptcy forecasting of Korean construction companies. Applied Soft Computing 24: 494–99. [Google Scholar] [CrossRef]
Hol, Suzan. 2007. The influence of the business cycle on bankruptcy probability. International Transactions in Operational Research 14: 75–90. [Google Scholar] [CrossRef]
Hua, Zhongsheng, Yu Wang, Xiaoyan Xu, Bin Zhang, and Liang Liang. 2007. Predicting corporate financial distress based on integration of support vector machine and logistic regression. Expert Systems with Applications 33: 434–40. [Google Scholar] [CrossRef]
Iturriaga, Felix J. L., and Ivan P. Sanz. 2015. Bankruptcy visualization and prediction using neural networks: A study of U.S. commercial banks. Expert Systems with Applications 42: 2857–69. [Google Scholar] [CrossRef]
John, George H., Ron Kohavi, and Karl Pfleger. 1994. Irrelevant features and the subset selection problem. In Machine Learning Proceedings 1994. Proceedings of the Eleventh International Conference. Edited by William Cohen and Haym Hirsh. San Francisco: Morgan Kaufmann Publishers, pp. 121–29. [Google Scholar] [CrossRef]
Jovic, Alan, Karla Brkic, and Nikola Bogunovic. 2015. A review of feature selection methods with applications. In 2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics. Edited by Peter Biljanovic, Z. Butkovic, K. Skala, B. Mikac, M. Cicin-Sain, V. Sruk, S. Ribaric, S. Gros, B. Vrdoljak, M. Mauher and et al. New York: IEEE, pp. 1200–5. [Google Scholar] [CrossRef]
Karas, Michal, Maria Reznakova, and Petr Pokorny. 2017. Predicting bankruptcy of agriculture companies: Validating selected models. Polish Journal of Management Studies 15: 110–20. [Google Scholar] [CrossRef]
Kim, Hyunjoon, and Zheng Gu. 2010. A Logistic Regression Analysis for Predicting Bankruptcy in the Hospitality Industry. Journal of Hospitality Financial Management 14: 17–34. [Google Scholar] [CrossRef]
Kim, Myoung-Jong, and Dae-Ki Kang. 2010. Ensemble with neural networks for bankruptcy prediction. Expert Systems with Applications 37: 3373–79. [Google Scholar] [CrossRef]
Kim, Myoung-Jong, Dae-Ki Kang, and Hong B. Kim. 2015. Geometric mean based boosting algorithm with over-sampling to resolve data imbalance problem for bankruptcy prediction. Expert Systems with Applications 42: 1074–82. [Google Scholar] [CrossRef]
Kliestik, Tomas, Jana Kliestikova, Maria Kovacova, Lucia Svabova, Katarina Valaskova, Marek Vochozka, and Judit Olah. 2018. Prediction of Financial Health of Business Entities in Transition Economies. New York: Addleton Academic Publishers. [Google Scholar]
Korol, Tomasz. 2010. Early Warning Systems of Enterprises to the Risk of Bankruptcy. Warsaw: Wolters Kluwer. [Google Scholar]
Kuhn, Max, and Kjell Johnson. 2013. Applied Predictive Modeling. New York: Springer. [Google Scholar]
Kumar, Ravi P., and Vadlamani Ravi. 2007. Bankruptcy prediction in banks and firms via statistical and intelligent techniques—A review. European Journal of Operational Research 180: 1–28. [Google Scholar] [CrossRef]
Lessmann, Stefan, Bart Baesens, Hsin-Vonn Seow, and Lyn C. Thomas. 2015. Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research. European Journal of Operational Research 247: 124–36. [Google Scholar] [CrossRef]
Li, Hui, and Jie Sun. 2012. Case-based reasoning ensemble and business application: A computational approach from multiple case representations driven by randomness. Expert Systems with Applications 39: 3298–310. [Google Scholar] [CrossRef]
Mączyńska, Elżbieta. 1994. Assessment of the condition of the enterprise. Simplified methods. Życie Gospodarcze 38: 42–45. [Google Scholar]
Marcinkevicius, Rosvydas, and Rasa Kanapickiene. 2014. Bankruptcy prediction in the sector of construction in Lithuania. Procedia—Social and Behavioral Sciences 156: 553–57. [Google Scholar] [CrossRef]
Ogólnopolski Monitor Upadłościowy (Coface Polish National Bankruptcy Monitor). 2019. Available online: http://www.coface.pl/en (accessed on 10 September 2019).
Ohlson, James A. 1980. Financial ratios and the probabilistic prediction of bankruptcy. Journal of Accounting Research 18: 109–31. [Google Scholar] [CrossRef]
Prusak, Błażej. 2005. Nowoczesne Metody Prognozowania Zagrożenia Finansowego Przedsiębiorstw. Warsaw: Wydawnictwo Difin. [Google Scholar]
Ptak-Chmielewska, Aneta. 2016. Statistical models for corporate credit risk assessment—Rating models. Acta Universitatis Lodziensis Folia Oeconomica 3: 98–111. [Google Scholar] [CrossRef]
Rajin, Danica, Danijela Milenkovic, and Tijana Radojevic. 2016. Bankruptcy prediction models in the Serbian agricultural sector. Economics of Agriculture 1: 89–105. [Google Scholar] [CrossRef]
Siddiqi, Naeem. 2017. Intelligent Credit Scoring, 2nd ed. Hoboken: John Wiley & Sons. [Google Scholar]
Stowarzyszenie Dolina Lotnicza (Aviation Valley Association). 2019. Available online: http://www.dolinalotnicza.pl/en/business-card (accessed on 10 September 2019).
Sun, Jie, Hamido Fujita, Peng Chen, and Hui Li. 2017. Dynamic financial distress prediction with concept drift based on time weighting combined with AdaBoost support vector machine ensemble. Knowledge-Based Systems 120: 4–14. [Google Scholar] [CrossRef]
Thomas, Lyn C. 2009. Consumer Credit Models. New York: Oxford University Press. [Google Scholar]
Topaloglu, Zeynep. 2012. A Multi-period Logistic Model of Bankruptcies in the Manufacturing Industry. International Journal of Finance and Accounting 1: 28–37. [Google Scholar] [CrossRef]
Tsai, Chih-Fong, and Yu-Feng Hsu. 2013. A meta-learning framework for bankruptcy prediction. Journal of Forecasting 32: 167–79. [Google Scholar] [CrossRef]
Tsai, Chih-Fong, and Jhen-Wei Wu. 2008. Using neural network ensembles for bankruptcy prediction and credit scoring. Expert Systems with Applications 34: 2639–49. [Google Scholar] [CrossRef]
Tsai, Chih-Fong, Yu-Feng Hsu, and David C. Yen. 2014. A comparative study of classifier ensembles for bankruptcy prediction. Applied Soft Computing 24: 977–84. [Google Scholar] [CrossRef]
Twala, Bhekisipho. 2010. Multiple classifier application to credit risk assessment. Expert Systems with Applications 37: 3326–36. [Google Scholar] [CrossRef]
Vlamis, Prodromos. 2007. Default Risk of the UK Real Estate Companies: Is There a Macro-economy Effect? The Journal of Economic Asymmetries 4: 99–117. [Google Scholar] [CrossRef]
West, David, Scott Dellana, and Jingxia Qian. 2005. Neural network ensemble strategies for financial decision applications. Computers & Operations Research 32: 2543–59. [Google Scholar]
Wolpert, David H. 1992. Stacked generalization. Neural Networks 5: 241–59. [Google Scholar] [CrossRef]
Youn, Hyewon, and Zheng Gu. 2010. Predict US restaurant fi rm failures: The artificial neural network model versus logistic regression model. Tourism and Hospitality Research 10: 171–87. [Google Scholar] [CrossRef]
Zhang, Cha, and Yunqian Ma. 2012. Ensemble Machine Learning. Methods and Applications. New York: Springer. [Google Scholar]
Zhang, Defu, Xiyue Zhou, Stephen C. H. Leung, and Jiemin Zheng. 2010. Vertical bagging decision trees model for credit scoring. Expert Systems with Applications 37: 7838–43. [Google Scholar] [CrossRef]
Zhou, Zhi-Hua. 2012. Ensemble Methods. Foundations and Algorithms. Boca Raton: CRC Press. [Google Scholar]
Zięba, Maciej, Sebastian K. Tomczak, and Jakub M. Tomczak. 2016. Ensemble boosted trees with synthetic features generation in application to bankruptcy prediction. Expert Systems with Applications 58: 93–101. [Google Scholar] [CrossRef]
Zweig, Mark H., and Gregory Campbell. 1993. Receiver-Operating Characteristic (ROC) plots: A fundamental evaluation tool in clinical medicine. Clinical Chemistry 39: 561–77. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Number of bankruptcies in Polish voivodeships in Q1 2019 and in 2018. Source: own elaboration based on the data analyzed from Coface Nationwide Bankruptcy Monitor (http://www.emis.com).

Figure 2. Number of business bankruptcies in Poland in 2019 by sector of activity. Source: own elaboration based on the data analyzed from Coface Nationwide Bankruptcy Monitor (http://www.emis.com).

Figure 3. Businesses in the Podkarpackie Voivodeship by sector of activity. Source: own elaboration based on the data analyzed from 2018 (http://www.emis.com).

Figure 4. The structure of the number of businesses in the Podkarpackie Voivodeship by the duration of business activity (in years). Source: own elaboration based on the data analyzed from 2018 (http://www.emis.com).

Figure 5. Structure of the number of businesses operating in the Podkarpackie Voivodeship by enterprise size. Source: own elaboration based on the data analyzed from 2018 (http://www.emis.com).

Figure 6. Distribution of the structure of businesses operating in the Podkarpackie Voivodeship by legal form of activity. Source: own elaboration based on the data analyzed from 2018 (http://www.emis.com).

Figure 7. A diagram presenting the idea of using ensemble classifiers. Source: own elaboration.

Figure 8. GBM model training process with the use of the stochastic gradient boosting algorithm. Source: own elaboration using R package.

Figure 9. Score scaling in relation to the corresponding probability of default (PD) for the GBM model. Source: own Elaboration using Excel.

Figure 10. ROC curves for models LDA, NB, C&RT, avNNet and GBM for the training sample. Source: own elaboration using the R package.

Figure 11. Interpretation of the Kolmogorov-Smirnov validation statistic for the LDA model and the test and validation sample: (a) Difference in cumulative distribution function for both classes relative to score; (b) Relationship of KS as the maximum difference between cumulative distribution functions for both classes relative to score. Source: own elaboration using Excel.

Figure 12. Score distribution for healthy and bankrupt businesses: (a) for the GBM model and very high divergence of distributions Div = 92.1; (b) for the LDA model and low divergence of distributions Div = 2.6. Source: own elaboration using Excel.

Figure 13. Optimal score cut-off point for the GBM model. Source: own elaboration using Excel.

Figure 14. Descriptive statistics characterizing the probability distribution of bankruptcy (over a 2-year time horizon) for the surveyed enterprises from the Podkarpacie for various sectors of their business activities. Source: own elaboration using Statistica.

Figure 15. Descriptive statistics characterizing the probability distribution of bankruptcy (over a 2-year time horizon) for the surveyed enterprises from Podkarpackie depending on the size of the enterprise. Source: own elaboration using Statistica.

Table 1. List of methods applied in forecasting business bankruptcy risk.

Methods Used in Forecasting Business Bankruptcy Risk
Conventional Approach Based on Single Classifiers		Ensemble Classifiers
Statistical Methods	Non-Statistical Methods and Machine Learning	Ensemble Classifiers
Logistic regression (LOGIT)	Mathematical programming	Stacking: - a level 2 meta-classifier aggregating classification results from base classifiers
Linear discriminant analysis (LDA)	Expert systems	Boosting (e.g.): - boosted trees, - GBM (Stochastic Gradient Boosting Machine), - boosted C5.0 trees, - boosted Logit, - other.
Classification and Regression Trees (C&RT)	Neural networks (NNet)	Bagging (e.g.): - Random Forest (RF), - bagged (LDA), - averaged Neural Networks (avNNet), - other.
Nearest Neighbor algorithm (k-NN) k-Nearest Neighbors	Support Vector Machine (SVM)
Naive Bayes classifier (NB)	Generalized Additive Models (GAM)
	Multivariate Adaptive Regression Splines (MARS)

Source: own elaboration based on the literature analyzed.

Table 2. Confusion matrix for the validation of classification consistency of the bankruptcy model.

ReportedBankruptcy	Forecast Bankruptcy
ReportedBankruptcy	B	NB
B (negative class: bankrupt)	TN (True Negative)	FN (False Negative)
NB (positive class: non-bankrupt)	FP (False Positive)	TP (True Positive)

Source: own elaboration.

Table 3. Discriminant capability measures—ranking of predictors.

Ratio	Discriminant Measures
Ratio	IV	GINI	V-Cramer
X9 (Z8)—Return On Sales (profit margin) (gross) [%]	5.81	0.86	0.84
X11 (Z10)—Overall Debt [%]	4.82	0.88	0.87
X12 (Z11)—Debt to Equity [%]	4.04	0.81	0.79
X13 (Z12)—Debt/EBITDA	2.76	0.72	0.66
X5 (Z4)—Return on Assets (ROA) [%]	1.66	0.68	0.60
X8 (Z7)—Net Profit Margin [%]	1.62	0.67	0.58
X10 (Z9)—Operating Return on Assets [%]	1.59	0.66	0.57
X4 (Z3)—Operating Profit Margin [%]	1.57	0.66	0.57
X7 (Z6)—Return On Invested Capital [%]	1.40	0.65	0.55
X20 (Z18)—Equity To Total Assets Structure [%]	1.39	0.65	0.55
X6 (Z5)—Return On Equity ROE [%]	1.18	0.63	0.51
X18 (Z16)—Liability Turnover	0.93	0.61	0.46
X21 (Z19)—Fixed Assets to Total Assets Structure [%]	0.89	0.57	0.38
X17 (Z15)—Inventory turnover	0.75	0.59	0.42
X15 (Z13)—Receivable Turnover	0.72	0.58	0.40
X19 (Z17)—Working Capital Turnover	0.66	0.58	0.39
X3 (Z2)—Cash Ratio	0.66	0.58	0.39
X1 (Z1)—Current Ratio	0.59	0.57	0.37
X16 (Z14)—Asset Turnover	0.28	0.52	0.22

Source: own elaboration using Statistica software.

Table 4. Predicted number of businesses at risk of bankruptcy in time horizon h = 2 (until 2020) and predicted number of businesses in an uncertain condition in the Podkarpackie Voivodeship for various sectors.

Sector	Number of Businesses Forecast by the Ensemble Scoring Model in a Given Bankruptcy Risk Class (h = 2 years, until 2020)
Sector	Bankrupt (B)	Uncertain (“Grey Zone”)	Healthy (No Risk of Bankruptcy) (NB)
A—farming, forestry and fishing	2 (4%) (small = 1; medium = 1)	3 (6%) (micro = 1; small = 1; large = 1)	45 (90%) (micro = 10; small = 12; medium = 10; large = 13)
B—mining and extraction	0	0	12 (100%) (micro = 4; small = 1; medium = 4; large = 3)
C—industrial processing	11 (2%) (micro = 1; small = 4; medium = 5; large = 1)	27 (5%) (micro = 6; small = 9; medium = 3; large = 9)	543 (93%) (micro = 84; small = 83; medium = 175; large = 201)
D—energy, water, gas and other energy sources	0	0	25 (100%) (micro = 4; small = 3; medium = 6; large = 12)
E—waste, wastewater and sewage management	0	2 (3%) (small = 1; medium = 1)	65 (97%) (micro = 11; small = 5; medium = 18; large = 31)
F—construction	7 (3%) (micro = 2; small = 1; medium = 2; large = 2)	10 (5%) (micro = 9; large = 1)	203 (92%) (micro = 55; small = 44; medium = 52; large = 52)
G—wholesale and retail	17 (2%) (micro = 9; small = 3; large = 5)	19 (3%) (micro = 5; small = 4; medium = 5; large = 1)	698 (95%) (micro = 185; small = 121; medium = 226; large = 166)
H—transport and storage management	2 (3%) (micro = 1; small = 1)	3 (4%) (small = 1; large = 2)	70 (93%) (micro = 18; small = 10; medium = 20; large = 22)
I—accommodation and gastronomy	7 (13%) (micro = 4; medium = 2; large = 1)	4 (7%) (micro = 1; medium = 2; large = 1)	45 (80%) (micro = 17; small = 8; medium = 6; large = 14)
J—information and communication	0	3 (5%) (micro = 3)	52 (95%) (micro = 16; small = 7; medium = 11; large = 18)
K—finance and insurance	0	4 (33%) (micro = 1; small = 2; large = 1)	8 (67%) (small = 2; medium = 4; large = 2)
L—services for the property market	2 (3%) (micro = 1; small = 1)	10 (14%) (micro = 5; small = 2; medium = 2; large = 1)	61 (83%) (micro = 20; small = 8; medium = 12; large = 21)
M—scientific, specialist and technological activity	1 (2%) (micro = 1)	2 (3%) (micro = 1; large = 1)	58 (95%) (micro = 24; small = 8;medium = 14; large = 12)
N—administration and support	2 (5%) (micro = 2)	3 (7%) (micro = 1; large = 2)	38 (88%) (micro = 11; small = 3; medium = 8; large = 16)
P—education	0	0	9 (100%) (micro = 1; small = 2; medium = 2; large = 4)
Q—health and social care	1 (3%) (large = 1)	2 (5%) (small = 1; large = 1)	35 (92%) (micro = 10; small = 5; medium = 8; large = 12)
R—entertainment and leisure	0	2 (18%) (micro = 2)	9 (82%) (micro = 5; small = 2; medium = 2)
S—other services	0	1 (9%) (micro = 1)	10 (91%) (micro = 3; medium = 5; large = 2)
Total	52 (2%) micro = 4%; small = 3%; medium = 2%; large = 2%	95 (5%) micro = 7%; small = 6%; medium = 2%; large = 4%	1986 (93%) micro = 89%; small = 91%; medium = 96%; large = 94%

Source: own elaboration using Statistica software.

Table 5. The actual effectiveness of the classification compatibility of the model verified on the basis of a sample of enterprises from the Podkarpackie Voivodeship.

Reported Bankruptcy	Forecast Bankruptcy (h = 2 years) to 2020
Reported Bankruptcy	Bankrupt	Uncertain (Potentially Bankrupt)	Healthy (No Risk of Bankruptcy)
Bankrupt	31 (79%)	3 (8%)	5 (13%)
No risk of bankruptcy	21 (1%)	92 (4%)	1981 (95%)

Source: own elaboration.

© 2020 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Pisula, T. An Ensemble Classifier-Based Scoring Model for Predicting Bankruptcy of Polish Companies in the Podkarpackie Voivodeship. J. Risk Financial Manag. 2020, 13, 37. https://doi.org/10.3390/jrfm13020037

AMA Style

Pisula T. An Ensemble Classifier-Based Scoring Model for Predicting Bankruptcy of Polish Companies in the Podkarpackie Voivodeship. Journal of Risk and Financial Management. 2020; 13(2):37. https://doi.org/10.3390/jrfm13020037

Chicago/Turabian Style

Pisula, Tomasz. 2020. "An Ensemble Classifier-Based Scoring Model for Predicting Bankruptcy of Polish Companies in the Podkarpackie Voivodeship" Journal of Risk and Financial Management 13, no. 2: 37. https://doi.org/10.3390/jrfm13020037

APA Style

Pisula, T. (2020). An Ensemble Classifier-Based Scoring Model for Predicting Bankruptcy of Polish Companies in the Podkarpackie Voivodeship. Journal of Risk and Financial Management, 13(2), 37. https://doi.org/10.3390/jrfm13020037

Article Menu

An Ensemble Classifier-Based Scoring Model for Predicting Bankruptcy of Polish Companies in the Podkarpackie Voivodeship

Abstract

1. Introduction

2. Literature Review

3. Environmental Background of the Research Conducted

3.1. Statistical Description of Bankruptcies in Poland

3.2. Characteristics of Companies Operating in the Podkarpackie Voivodeship

4. Materials and Methods

4.1. Ensemble Classifier Methodology

4.2. Feature Selection Process in Bankruptcy Prediction

4.3. Data Samples Description

4.4. Procedure of Mapping PD into Scores

4.5. Validation Measures of Bankruptcy Prediction Models

4.6. Optimal Cut-Off Point for Scoring Determination

5. Research Results

5.1. Feature Selection Stage—Selection of Ratios/Bankruptcy Risk Determinants

5.2. Calibration of the Parameters of Bankruptcy Risk Forecast Models (Calibration Stage)

5.3. Determining Score for the Optimum Model (Score Scaling Stage)

5.4. Model Validation (validation Stage)

5.5. Optimal Cut_Off Point Determination Stage

5.6. Classification of Enterprises from the Podkarpacie Region (Prediction Stage) Depending on the Risk of Their Bankruptcy

6. Discussion

7. Conclusions

Funding

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI