An Ensemble Classifier-Based Scoring Model for Predicting Bankruptcy of Polish Companies in the Podkarpackie Voivodeship

This publication presents the methodological aspects of designing of a scoring model for an early prediction of bankruptcy by using ensemble classifiers. The main goal of the research was to develop a scoring model (with good classification properties) that can be applied in practice to assess the risk of bankruptcy of enterprises in various sectors. For the data sample, which included 1739 Polish businesses (of which 865 were bankrupt and 875 had no risk of bankruptcy), a genetic algorithm was applied to select the optimum set of 19 bankruptcy indicators, on the basis of which the classification accuracy of a number of ensemble classifier model variants (boosting, bagging and stacking) was estimated and verified. The classification effectiveness of ensemble models was compared with eight classical individual models which made use of single classifiers. A GBM-based ensemble classifier model offering superior classification capabilities was used in practice to design a scoring model, which was applied in comparative evaluation and bankruptcy risk analysis for businesses from various sectors and of different sizes from the Podkarpackie Voivodeship in 2018 (over a time horizon of up to two years). The approach applied can also be used to assess credit risk for corporate borrowers.


Introduction
According to statistical data from 2018-2019, 30-60 businesses in Poland announce bankruptcy each month. Business bankruptcy is invariably an adverse phenomenon for the business itself and its employees, but it is also a problem for its creditors, banks and partners. The high number of bankruptcies reported may also lead to negative consequences locally-for the economic development and economic circumstances of the region-and on the national scale-for the economy of the whole country. For this reason, the issue of early prediction of business bankruptcy, and therefore the possibility of forecasting the risk of business bankruptcy over a long time horizon (even up to several years), is a very important financial and economic problem. In its financial and economic dimension, bankruptcy (i.e., business default) is defined as a situation in which a business is unable (for various reasons) to meet its liabilities towards creditors. For businesses operating in market economics conditions, a potential risk of bankruptcy always exists. The risk is the most commonly defined as the probability of defaulting on liabilities incurred (probability of default, PD). The subject of modeling risk of bankruptcy is also of enormous importance for institutions granting corporate loans, to whom the bankruptcy of a corporate debtor means a potential loss of the loan granted.
The main objective of this study was to design a scoring model based on ensemble classifiers which could be used to forecast the risk of bankruptcy for Polish businesses conducting activity in the Podkarpackie Voivodeship over a time horizon of up to two years. One of the reasons for using a developed scoring model based on ensemble classifiers to forecast bankruptcy risk for companies from the Podkarpackie region in this study is the fact that the Podkarpackie Voivodeship (along with several other Polish regions) just after a period of political transformation of Poland from socialism to market economy, was notably lagging behind in development. It belonged to the group of several eastern regions (voivodeships) from the so-called the eastern wall, which was overlooked and underestimated in the policies pursued by relevant governments. The selection of companies from the region was also influenced by the fact that the Podkarpackie Voivodeship is currently one of the "development tigers" in Poland and is catching up quickly. This is mainly due to the more effective policies of the current government aimed at equalizing the development opportunities of Polish regions. The Podkarpackie Voivodship is not a very large voivodeship in relation to other regions of Poland as it occupies 11th place in a ranking of all 16 voivodeships, with an area of 17,846 km 2 (source: Główny Urząd Statystyczny (2019) (Statistics Poland)-Local Data Bank, https://bdl.stat.gov.pl). The attractiveness of the voivodeship, however, is influenced by its geographical location, which is conducive to the development of ecological agriculture and tourism (also international-the Bieszczady Mountains). A big advantage of the region is also its border location (the region borders Ukraine and Slovakia-which also belongs to the EU). Due to its population size, the Podkarpackie region belongs to the group of medium-populated regions of Poland and takes 8th place in this ranking, with a population of approximately 2.1 million (source: Główny Urząd Statystyczny (2019) (Statistics Poland)-Local Data Bank, https://bdl.stat.gov.pl). The voivodeship also has no very developed industries, in comparison to other more industrialized regions of Poland. Nevertheless, the Podkarpackie Voivodeship belongs to the group of the fastest developing regions of Poland. In terms of income per capita, the Podkarpackie Voivodeship took 2nd place in 2018 in the ranking of 16 Polish regions-voivodships (revenues at the level of PLN 562.4 per inhabitant (source: Statistics Poland-Local Data Bank, https://bdl.stat.gov.pl). Also in 2018, the Podkarpackie region was the most dynamically developing region of Poland in terms of the growth of generated GDP (GDP). The Podkarpackie recorded an increase of 7.8% of GDP in 2018 compared to the previous year. In 2018, the GDP generated in the Podkarpackie already constituted 3.9% of Poland's GDP and was 9th place in the regions (ranking source: Statistics Poland-Local Data Bank, https://bdl.stat.gov.pl/BDL/start). This proves that the region's economy is already very dynamic but at present is still progressing. The economy of the Podkarpackie region stands out positively and has a very large impact on its potential cluster of aviation industry enterprises belonging to the so-called aviation valley and the dynamic development of road and transport infrastructure (e.g., the route of the international European North-South communication line Via Carpatia), as well as the development of innovation (innovative technologies) in the region. The companies that drive development in the region belong to Stowarzyszenie Dolina Lotnicza (2019) (Aviation Valley Association), that include many aviation industry companies that provide services to major aviation manufacturers around the world (e.g., Boeing, Airbus, source: http://www.dolinalotnicza.pl/en/business-card). These include companies such as 3M Poland, 3D Robot, Boeing Distribution Services, Pratt & Whitney Poland, Collins Aerospace, Goodrich Aerospace Poland, General Electric Company Poland, Hamilton Sundstrand Poland, Heli-One, Safran Transmission Systems Poland and MTU Aero Engines Poland. The very dynamic development of economic potential in the Podkarpackie region also affects the quality of life of its inhabitants. The Podkarpackie Voivodeship has been high in the quality of life rankings for several years. All these factors make it sensible to conduct a comprehensive analysis and an assessment of the risk of bankruptcy of enterprises operating in the Podkarpackie region using the most effective models of forecasting and assessing the risk of their bankruptcy. Therefore, first the work focused on developing an adequate scoring model for bankruptcy forecast using ensemble classifiers, and analyzing and verifying its prognostic capacity (classification efficiency), while only later on using it in practice to comprehensively assess the bankruptcy risk of enterprises from the Podkarpackie region belonging to various sectors of the economy (depending on the declared classification of their activities) that can also be distinguished by their size.
The article details the stages in which the scoring model was designed and implemented in practice. The scoring model design stage involved the comparison of the predictive capability of ensemble models used in this study with that of conventional single classifiers. The results of previous works of many authors (see e.g., Anwar et al. 2014; Barboza et al. 2017;Tsai et al. 2014) indicate that the models based on ensemble classifiers help achieve more accurate results and improve the discriminant capability of the model. On the basis of the scoring model designed, a bankruptcy risk assessment for businesses from the Podkarpackie Voivodeship was carried out based on the sector in which they operated and the size of the business.
The main innovation of the research presented in the article is that previous studies of other authors did not discuss the practical use of the scoring model for comprehensive analysis of the bankruptcy risk of companies (also from different sectors) operating in the Podkarpackie region, using the ensemble classifiers approach.

Literature Review
The various problems of bankruptcy of businesses are widely described in the literature. The significance and salience of the bankruptcy problem has motivated many authors to concentrate on this issue in their research. The first mentions of the subject of modeling business bankruptcy and forecasting its likelihood appeared in economic and financial literature in 1968. The first study on risk bankruptcy modeling was published by Altman (1968). The early bankruptcy prediction studies applied statistical methods and mainly concerned the use of different variants of discriminant analysis or logistic regression (Ohlson 1980;Begley et al. 1996). Since those models had significant limitations, artificial intelligence and machine learning methods that were successfully applied in image recognition tasks were gradually also implemented in bankruptcy forecasting. It was found that machine learning techniques such as neural networks (NNet), Support Vector Machines (SVM) and ensemble classifier methods have better forecasting capabilities and higher classification effectiveness than conventional approaches. An overview of the previous research on the application of statistical methods and machine learning techniques in business bankruptcy prediction can be found in studies such as the ones by Kumar and Ravi (2007) and Lessmann et al. (2015). Alaka et al. (2018) presented a comprehensive overview of literature and systematics of predictive models used in business bankruptcy forecasting, including: purpose of research, method of selecting variables for the model, sample size for analyzed businesses (also including bankrupt ones) and a comparison of the effectiveness of models' classification measures.
Some works deal with the issues of forecasting and assessing the risk of bankruptcy of enterprises, taking into account the specificities of the sector of their activity. Rajin et al. (2016) conducted a bankruptcy risk assessment for Serbian agricultural enterprises, which is one of the most significant sectors of the Serbian economy. The classification efficiency of several models was compared using the methods of linear discriminant analysis. Their research shows that models taking into account the specifics of economies and market characteristics (e.g., the European market-DF-Kralicek's model) give better results for the Serbian economy than models created for American markets (e.g., the classic Altman Z-Score model). Karas et al. (2017) dealt with similar problems, who showed that classic scoring models developed for the US economy (Z-Score Altman, Altman-Sabato's models) and IN05-designed and developed for the Czech enterprises-are less effective compared to the original validation results. This forces researchers to develop more adequate models, in particular taking into account the specificity and financial indicators of the agricultural sector and the economy of the country affecting the bankruptcy of enterprises. Receiver-Operating Characteristic (ROC) curves were used to measure the effectiveness of the models. Chen et al. (2013) dealt with the problems of forecasting the bankruptcy risk of industrial enterprises in the manufacturing sector in China. They used a modified variant of Multi-Criteria Linear Programming algorithm (so-called MC2LP algorithm) to forecast the risk of bankruptcy of 1499 Chinese enterprises from the studied sector and selected 36 financial indicators to assess their financial condition. The classification efficiency of the studied model was compared with the efficiency of the classic MCLP model and the SVM approach. Matrix correctness (compliance) matrices were used as measures of classification accuracy. The use of the model proposed by the authors enables setting up a variable value for the cut-off point (determining the expected belonging of objects to classes) and thus systematically correcting incorrect classification errors. Topaloglu (2012) dealt with the forecast of bankruptcy of American enterprises from the manufacturing sector using a multi-period logistic regression model, the so-called hazard models. The research period covered bankruptcies from 1980-2007 and the results show that macroeconomic diagnostic variables in a model such as GDP have the very large impact on the assessment of their bankruptcy. The study shows that accounting indicators for assessing the financial condition of enterprises used in the model lose their predictive power (become irrelevant) when global market and macroeconomic indicators are taken into account. Achim et al. (2012) studied the financial risk of bankruptcy for Romanian enterprises from the manufacturing sector using the Principal Component Analysis method in the period of 2000-2011, and thus taking into account the impact of the global crisis on financial markets. The research sample included 53 enterprises registered in Romania and operating in the production sector, including 16 selected and most frequently used financial indicators used in the study. The study shows good predictive quality of the model tested and presents its potential application possibilities. In the literature, you can also find works on the modeling of bankruptcy risk for enterprises operating in other sectors of the economy, e.g., Marcinkevicius and Kanapickiene (2014) for companies from the construction sector, as well as Kim and Gu (2010), Youn and Gu (2010) and Diakomihalis (2012) for companies in the hotel and restaurant sector.
It is also necessary to emphasize an important aspect in the research on the risk of bankruptcy of enterprises, which is taking into account the impact of economic cycles and selected macroeconomic variables of the market while considering the effect of cyclical economic conditions of countries. In Vlamis (2007) statistical logistic and probit regression models were used to forecast the risk of bankruptcy of American real estate companies in the period 1980-2001. It has been shown that financial indicators such as profitability, debt service and company liquidity are important determinants of the risk of bankruptcy of the surveyed enterprises. A number of key macroeconomic financial variables have also been used because the risk of borrowers' bankruptcy depends on the state of the economy and the current business cycle. Similar issues were dealt with in the publication by Hol (2007), which concerned the study of the impact of business cycles on the probability of bankruptcy of Norwegian companies. It has been shown that models that take into account the impact of economic cycles have better prognostic properties than models that only take into account the financial indicators of companies. In a similar study, Bruneau et al. (2012) analyzed the relationship between macroeconomic shocks and exposure to the risk of bankruptcy of companies in France belonging to different sectors of classification of activities. The study of the dependence of the risk of bankruptcy on economic cycles was carried out using the two-equation VAR model based on data from 1990-2010.
At this point one should also mention Polish authors' significant contribution to the development of bankruptcy forecast models which take into account the specific nature of the Polish economy. Their research is mostly based on classical techniques, using statistical methods or machine learning tools and the methods for predicting and evaluating the risk of business bankruptcy. The results obtained by Polish authors studying bankruptcy risk modeling can be found in publications by Korol (2010), Hadasik (1998), Hamrol and Chodakowski (2008), Mączyńska (1994), Prusak (2005) and Ptak-Chmielewska (2016). In the context of the research done by Polish authors, a very interesting and detailed comparative analysis of the subject of enterprise bankruptcy forecasting in East-Central Europe and an overview of models applied from the perspective of developing economies of the countries of the region in the transformation period was presented by Kliestik et al. (2018).
In recent years, ensemble classifiers have been successfully used for predicting bankruptcy of businesses. Some studies of this type include Barboza et al. (2017), Brown and Mues (2012) and Zięba et al. (2016). They are dedicated to the application of ensemble classifiers in forecasting bankruptcy of businesses and demonstrate that the ensemble classifiers offer better forecasting properties and accuracy than conventional statistical methods. Moreover, a study by Kim et al. (2015) proved that ensemble models are more resistant to the sample imbalance problem (for bankrupt businesses and those at no risk of bankruptcy) during the statistical data preparation phase.
Many studies on the application of ensemble classifiers in business bankruptcy forecasting refer to boosting and bagging methods (sequential correction and classification error minimization, as well as component classifier result sampling and combining) in order to increase classification performance of the entire forecasting system. In studies by Cortes et al. (2007) and Heo and Yang (2014), Adaboost (an adaptive boosting algorithm) was applied to decision trees as basic classification models. The use of ensemble classifiers with a classifier boosting technique based on neural network classifiers was discussed in studies by Alfaro et al. (2008), Fedorova et al. (2013), Kim and Kang (2010) and West et al. (2005). A different approach was adopted by Kim et al. (2015) and Sun et al. (2017) who used support vector machines (SVM) as base classifiers, which were boosted as a group of ensemble classifiers. Bagging is also a method frequently used in practical applications of ensemble classifiers. This subject dealt with studies which analyze the classification effectiveness of such ensemble classifiers by relying on several models of base classifiers developed by Hua et al. (2007), Zhang et al. (2010) and Twala (2010). The use of ensemble classifiers with combining (stacking) the results of several classifiers in a single meta-classifier was discussed in studies such as those by Iturriaga and Sanz (2015), Tsai and Wu (2008) and Tsai and Hsu (2013). Furthermore, many studies are dedicated to the use of various techniques of combining the results of base model classification: such as neural networks in the form of self-organizing maps (SOMs), rough sets techniques, case-based reasoning and classifier consensus methods. Examples of the use of this type of ensemble classifiers were examined by Ala'raj and Abbod (2016), Du Jardin (2018), Chuang (2013) and Li and Sun (2012).

Statistical Description of Bankruptcies in Poland
According to data from Ogólnopolski Monitor Upadłościow (2019) (Coface Nationwide Bankruptcy Monitor-source: http://www.emis.com, http://www.coface.pl/en) a total of 798 businesses declared bankruptcy in 2018. Most bankruptcies were reported in October and September (76 and 74, respectively) and in the following months: March, April, May (67, 66 and 65, respectively), with 61 bankruptcies reported in January. The months with the relatively lowest number of bankruptcies were declared in August (42)   When analyzing the number of bankrupt businesses in Poland in the year 2019 depending on their business activity, we may notice that the highest number of bankruptcies concerned businesses carrying out varied individual activities (one-person businesses, self-employment)-108 (27% bankrupt), followed by commercial law companies from the commerce (trade) sector-91 (22%), and from the industrial and service sector-70 (17%) and 63 (16%), respectively. In 2019, 51 (13%) businesses from the construction sector, 11 (3%) transport and logistics businesses and 9 (2%) businesses involved in other activities declared their bankruptcy. Figure 2 shows the distribution of the number of bankrupt businesses by their type and the sector of their activity. When analyzing the number of bankrupt businesses in Poland in the year 2019 depending on their business activity, we may notice that the highest number of bankruptcies concerned businesses carrying out varied individual activities (one-person businesses, self-employment)-108 (27% bankrupt), followed by commercial law companies from the commerce (trade) sector-91 (22%), and from the industrial and service sector-70 (17%) and 63 (16%), respectively. In 2019, 51 (13%) businesses from the construction sector, 11 (3%) transport and logistics businesses and 9 (2%) businesses involved in other activities declared their bankruptcy. Figure 2 shows the distribution of the number of bankrupt businesses by their type and the sector of their activity.

Characteristics of Companies Operating in the Podkarpackie Voivodeship
According to Emerging Markets Information Service (2019)-EMIS (http://www.emis.com), in 2018 about 3679 companies and partnerships were registered and operating in the Podkarpackie Voivodeship (the number of available financial statements for 2018 in the database). Their reported sector of activity belonged to one of the following 18 areas: A-farming, forestry and fishing, Bmining and extraction, C-industrial processing, D-production of energy, supply of water, gas and other energy sources, E-waste, waste water and sewage management, F-construction, Gwholesale and retail, and servicing vehicles and motorcycles, H-transport and storage management, I-accommodation and food services, J-information and communications, K-finance and insurance, L-services for the property market, M-scientific, specialist and technological activity, N-administration and support, P-education, Q-health and social care, R-culture, entertainment and leisure, S-other services. Figure 3 presents the structure of the number of businesses operating in the Podkarpackie Voivodeship by sector.

Characteristics of Companies Operating in the Podkarpackie Voivodeship
According to Emerging Markets Information Service (2019)-EMIS (http://www.emis.com), in 2018 about 3679 companies and partnerships were registered and operating in the Podkarpackie Voivodeship (the number of available financial statements for 2018 in the database). Their reported sector of activity belonged to one of the following 18 areas: A-farming, forestry and fishing, B-mining and extraction, C-industrial processing, D-production of energy, supply of water, gas and other energy sources, E-waste, waste water and sewage management, F-construction, G-wholesale and retail, and servicing vehicles and motorcycles, H-transport and storage management, I-accommodation and food services, J-information and communications, K-finance and insurance, L-services for the property market, M-scientific, specialist and technological activity, N-administration and support, P-education, Q-health and social care, R-culture, entertainment and leisure, S-other services. Figure 3 presents the structure of the number of businesses operating in the Podkarpackie Voivodeship by sector. In 2018, 997 businesses in the Podkarpackie Voivodeship, a vast majority of this area's businesses, operated in the wholesale and retail sector. Economic activity in the field of industrial processing was declared by 718 enterprises, followed by sectors such as the construction, scientific, specialist and technological activity, and services for the property market sectors (344, 321, 270 businesses, respectively). The lowest number of businesses operated in sectors such as education-37, production of energy and supply of energy sources-35, culture-27, other services-24, as well as mining and extraction-21. Figure 4 presents the structure of the number of businesses in the Podkarpackie Voivodeship by the duration for which they have functioned (in years). Most businesses, i.e., 1792 (which corresponds to 49% of all analyzed entities) have operated in the market for a very long time-10 years. Nearly as many businesses, i.e., 1720 (47% of the total number), have been active for a medium number of years, whereas 'young' enterprises (167), established in the period from 2017 to 2019 and active for up to two years, constituted only 4% of all businesses analyzed. In 2018, 997 businesses in the Podkarpackie Voivodeship, a vast majority of this area's businesses, operated in the wholesale and retail sector. Economic activity in the field of industrial processing was declared by 718 enterprises, followed by sectors such as the construction, scientific, specialist and technological activity, and services for the property market sectors (344, 321, 270 businesses, respectively). The lowest number of businesses operated in sectors such as education-37, production of energy and supply of energy sources-35, culture-27, other services-24, as well as mining and extraction-21. Figure 4 presents the structure of the number of businesses in the Podkarpackie Voivodeship by the duration for which they have functioned (in years). Most businesses, i.e., 1792 (which corresponds to 49% of all analyzed entities) have operated in the market for a very long time-10 years. Nearly as many businesses, i.e., 1720 (47% of the total number), have been active for a medium number of years, whereas 'young' enterprises (167), established in the period from 2017 to 2019 and active for up to two years, constituted only 4% of all businesses analyzed. An analysis of businesses operating in the Podkarpackie Voivodeship according to their size ( Figure 5) shows that 40% (1461) of all enterprises are very small, they are the so-called microenterprises. Small businesses constituted a further 15%. Overall, over a half of businesses (55%) were either micro-enterprises or small enterprises. The number of medium and small enterprises was more or less equal, which corresponds respectively to 22% and 23% of all entities analyzed. The size of the enterprise was identified in accordance with the legal provisions of the classification of Polish enterprises adapted to EU law and directives. Micro enterprises were identified according to the rule: number of employees <10 and annual Turnover <= 2 m €. Small enterprises were identified as not being micro enterprises and fulfilling the conditions: number of employees <50 and annual Turnover <= 10 m €. Medium enterprises were identified as not being small and fulfilling the conditions: number of employees <250 and annual Turnover <= 50 m €. Therefore, large enterprises were identified according to the rule: number of employees >= 250 and annual Turnover >50 m € (source: https://ec.europa.eu/growth/smes/business-friendly-environment/sme-definition_pl). An analysis of businesses operating in the Podkarpackie Voivodeship according to their size ( Figure 5) shows that 40% (1461) of all enterprises are very small, they are the so-called micro-enterprises. Small businesses constituted a further 15%. Overall, over a half of businesses (55%) were either microenterprises or small enterprises. The number of medium and small enterprises was more or less equal, which corresponds respectively to 22% and 23% of all entities analyzed. The size of the enterprise was identified in accordance with the legal provisions of the classification of Polish enterprises adapted to EU law and directives. Micro enterprises were identified according to the rule: number of employees <10 and annual Turnover <= 2 m €. Small enterprises were identified as not being micro enterprises and fulfilling the conditions: number of employees <50 and annual Turnover <= 10 m €. Medium enterprises were identified as not being small and fulfilling the conditions: number of employees <250 and annual Turnover <= 50 m €. Therefore, large enterprises were identified according to the rule: number of employees >= 250 and annual Turnover >50 m € (source: https://ec.europa.eu/growth/smes/business-friendly-environment/sme-definition_pl). Among all businesses operating in the Podkarpackie Voivodeship, only 7 were listed in the stock market, while 3672 were non-listed companies. An analysis of legal forms of businesses in the Podkarpackie Voivodeship ( Figure 6) shows that the vast majority (73.6%) are limited liability companies (private limited companies). There are 2.9% enterprises operating as public limited companies, and only 0.5% are limited partnerships. The remaining businesses, having other legal forms, constitute 23% of all enterprises analyzed in this study.  Among all businesses operating in the Podkarpackie Voivodeship, only 7 were listed in the stock market, while 3672 were non-listed companies. An analysis of legal forms of businesses in the Podkarpackie Voivodeship ( Figure 6) shows that the vast majority (73.6%) are limited liability companies (private limited companies). There are 2.9% enterprises operating as public limited companies, and only 0.5% are limited partnerships. The remaining businesses, having other legal forms, constitute 23% of all enterprises analyzed in this study. Among all businesses operating in the Podkarpackie Voivodeship, only 7 were listed in the stock market, while 3672 were non-listed companies. An analysis of legal forms of businesses in the Podkarpackie Voivodeship ( Figure 6) shows that the vast majority (73.6%) are limited liability companies (private limited companies). There are 2.9% enterprises operating as public limited companies, and only 0.5% are limited partnerships. The remaining businesses, having other legal forms, constitute 23% of all enterprises analyzed in this study.

Materials and Methods
As can be seen in the above analysis of literature, in practice business bankruptcy risk assessment makes use of various classifier models. Both classical statistical methods and more advanced non-statistical methods are used, with the latter based on various machine learning techniques. The use of so-called ensemble classifiers, i.e., classifiers designed to increase classification efficiency in relation to the conventional approach (which is based on single classifiers), are becoming increasingly popular-for obvious reasons. Table 1 contains an overview of business bankruptcy risk forecasting models that are most often used in practice.
Classical business bankruptcy forecasting models using single classifier models are very-well known and presented in many publications. Meanwhile, the presents study focuses mainly on a detailed presentation of the ensemble classifier methodology. A detailed discussion of classical models and models used in business bankruptcy forecasts can be found e.g., in monographs by Kuhn and Johnson (2013) and Hastie et al. (2013). Table 1. List of methods applied in forecasting business bankruptcy risk.

Non-Statistical Methods and Machine Learning
Logistic regression (

Ensemble Classifier Methodology
The ensemble classifier methodology involves combining several single classifiers into an ensemble of classifiers performing the same task in order to improve the effectiveness of classification (the discriminant capability of the entire model) defined as correct assignment of objects into expected classes. This is done by suitably aggregating (often by weighing) results of classification obtained from component classifiers to arrive at a resultant classifier with the best possible forecasting capabilities (surpassing those of all base classifiers in use). Figure 7 shows a functional diagram of ensemble classifiers.
J. Risk Financial Manag. 2020, 13, x FOR PEER REVIEW 12 of 34 . Figure 7. A diagram presenting the idea of using ensemble classifiers. Source: own elaboration.
A detailed description of ensemble classifier methodology, their types, characteristics and numerous practical applications can be found in monographs by Zhang and Ma (2012) as well as Zhou (2012). In practice, three well-known approaches: boosting, bagging and combining are applied in ensemble classifier methods. The terminology of boosting ensemble classifiers refers to a broad class of algorithms which enable boosting "weak classifiers", turning them into "strong qualifiers" (of excellent classification performance approaching that of perfect models). An example of such approach is AdaBoost-an adaptive boosting algorithm (Freund and Schapire 1997). In AdaBoost, classifiers of the same type, e.g., boosted classification trees, serve as base classifiers. Voting strategies are most commonly used in order to determine object classes, aggregating their output classifications, such as majority voting, plurality voting, weighted voting or soft voting. The AdaBoost.M1 adaptive boosting algorithm in the case of object classification for two classes contains the following steps (see: Algorithm 1, Zhang and Ma 2012, p. 14).
Algorithm 1 AdaBoost.M1 algorithm 1. Inputs: a set of input data for the training sample , , = 1, . . . , , ∈ , -learning with a pattern 2. Ensemble classifier: an ensemble classifier with the number of boosting cycles -iterations 3. Initialization: initial distribution of weights for observation from training set ( ) = 1/ 4. Perform in loop FOR = 1,2, . . . , -Pick a random training subset with distribution -Train the base classifier on subset , assume hypothesis ℎ : → concerning classification accuracy relative to the pattern -Calculate classification error for hypothesis ℎ : = ∑ ⟦ℎ ( )⟧ ( ) The name of the second group of ensemble classifier making use of the bagging method is derived from the English abbreviation: Bootstrap AGGregatING (Breiman 1996). This group of ensemble classifiers involves bootstrap sampling to obtain training subsets for base classifiers. Each the classifier is therefore trained on a different training sample, and the results are aggregated. Here, classifiers of the same type are used most often as base classifiers. An example of such a type of ensemble classifiers is Random Forest. The bootstrap aggregation algorithm for object classification into two classes has the following steps (see: Algorithm 2, Zhang and Ma 2012, p. 12). A detailed description of ensemble classifier methodology, their types, characteristics and numerous practical applications can be found in monographs by Zhang and Ma (2012) as well as Zhou (2012). In practice, three well-known approaches: boosting, bagging and combining are applied in ensemble classifier methods. The terminology of boosting ensemble classifiers refers to a broad class of algorithms which enable boosting "weak classifiers", turning them into "strong qualifiers" (of excellent classification performance approaching that of perfect models). An example of such approach is AdaBoost-an adaptive boosting algorithm (Freund and Schapire 1997). In AdaBoost, classifiers of the same type, e.g., boosted classification trees, serve as base classifiers. Voting strategies are most commonly used in order to determine object classes, aggregating their output classifications, such as majority voting, plurality voting, weighted voting or soft voting. The AdaBoost.M1 adaptive boosting algorithm in the case of object classification for two classes contains the following steps (see: Algorithm 1, Zhang and Ma 2012, p. 14).
Ensemble classifier: an ensemble classifier with the number of boosting cycles T-iterations 3.
Initialization: initial distribution of weights for observation from training set D 1 (i) = 1/N 4.
Perform in loop FOR t = 1, 2, . . . , T -Pick a random training subset S t with distribution D t -Train the base classifier on subset S t , assume hypothesis h t : X → Y concerning classification accuracy relative to the pattern -Calculate classification error for hypothesis h t : Weighted majority voting. For a given unnamed instance of z obtain a voting result concerning case membership in each of the classes V c = t:h t (z)=ω c log 1 β t , c = 1, 2. 7.
Output: Membership in the class of the greatest value of V c .
The name of the second group of ensemble classifier making use of the bagging method is derived from the English abbreviation: Bootstrap AGGregatING (Breiman 1996). This group of ensemble classifiers involves bootstrap sampling to obtain training subsets for base classifiers. Each the classifier is therefore trained on a different training sample, and the results are aggregated. Here, classifiers of the same type are used most often as base classifiers. An example of such a type of ensemble classifiers is Random Forest. The bootstrap aggregation algorithm for object classification into two classes has the following steps (see: Algorithm 2, Zhang and Ma 2012, p. 12).

1.
Inputs: a set of input data for the training sample S; training algorithm with a pattern, base classifier T-ensemble size; R-percentage of the sample for determining training subsets for sampling 2. Perform in loop FOR t = 1, 2, . . . , T -Randomly select a replication subset-training sample S t by selecting R % of S at random -Train the base classifier on subset S t , obtain hypothesis for classifier h t concerning classification accuracy relative to the pattern - Combine classification results in an ensemble combination-simple majority voting: for a given unnamed instance of x, obtain a voting result concerning case membership in each of the classes 5.
Evaluate class membership results on the basis of ensemble classifier ε = {h 1 , . . . , h T } for the analysed case x 6. Let Output: Membership in the class of the greatest value of V c .
A group of methods called ensemble combining represents a wholly different approach. The group includes the so-called combined methods utilizing results of classification functions for single (base) classifiers and aggregating them into the result classification function using the averaging approach (simple or weighted averaging of base classifier results), voting approach (using various types of voting strategies, e.g., majority voting) or stacked generalization approach. The stacking ensemble methodology, pioneered by Wolpert (1992), is based on a combined approach whereby base classifiers (level 1 classifiers) are trained on the same random samples, and then relevant classification results (their classification functions) are used as training samples for the new meta-classifier (level 2 classifier) and aggregated in result classifications.

Feature Selection Process in Bankruptcy Prediction
A deeply significant classification-related issue is the problem of choosing the appropriate (optimum) set of diagnostic variables (i.e., the feature selection problem). Detailed characteristics of methods used for the selection of relevant variables for forecast models can be found in studies by John et al. (1994) and Jovic et al. (2015). Wrapper methods are frequently used techniques which analyze possible predictor subsets and determine the effectiveness of their impact on the model's dependent variable on the basis of a search algorithm, the best subset of variables and the classification method applied. In order to search all variable subsets, the search algorithm is 'wrapped' around the classification model, hence the name of this group of methods. Wrapper feature selection methods are based on various approaches of searching for the optimum subset of predictors. Such approaches can be divided into two basic groups: deterministic and randomized. This group of deterministic methods applies various types of sequential algorithms, e.g., progressive stepwise selection or backward stepwise elimination. Wrapper feature selection methods most frequently use random algorithms such as simulated annealing, genetic algorithms or ant colony optimization. A method employing a genetic algorithm in order to search for the optimum subset of predictors is often used to select variables for bankruptcy models. The genetic algorithm of Feature Selection (see: Algorithm 3) is executed according to the procedure designed by Kuhn and Johnson (2013).

1.
Define: stopping criteria, number of children for each generation (gensize), and probability of mutation (pm) 2.
Generate: an initial random set of m binary chromosomes, each of length p 3.

Data Samples Description
The original research sample used in the study included data for 1739 Polish enterprises (bankrupt and not threatened with bankruptcy). This sample included calculated values for 19 financial indicators determining the financial condition of selected enterprises (characterized in detail in Section 5.1 and selected for the study using the wrapper search technique and genetic algorithm discussed in detail in Section 4.2). For bankrupt enterprises, the values of diagnostic variables were set at 1 or 2 years before the actual period of their bankruptcy. Statistical data came from the financial statements of enterprises from 2010-2018 available in the EMIS database (http://www.emis.com). Bankruptcy episodes were identified on the basis of statistics from the EMIS database source: Ogólnopolski Monitor Upadłościow (2019) (Coface Polish National Bankruptcy Monitor, source: http://www.emis.com, http://coface.pl/en). The balanced sample used included a total of 1739 research cases from all major sectors of the economy (865-cases for bankrupt enterprises and 874-randomly selected cases for enterprises not at risk of bankruptcy with strong financial conditions). The condition for the non-defaulted enterprises was evaluated on the basis of careful analysis and evaluation of values of many financial indicators, such as profitability ratios, debt ratios, management performance indicators etc., which determined their financial condition and low exposure to the bankruptcy risk. A 70% teaching sample was drawn from the research sample (1217 enterprises: 592-bankrupt and 625-not threatened with bankruptcy), which was used to train and calibrate the parameters of the bankruptcy models used. The remaining cases constituted a random 30% set for the test-validation sample (522-enterprises: 273-bankrupt and 249-not threatened with bankruptcy), which was used at the stage of validation of models to check their predictive properties for new, unknown cases. A separate research sample was designated for enterprises from the Podkarpackie Voivodeship, which included 2133 enterprises of various sizes from the Podkarpackie region registered in various sectors of economic activity. This sample included all enterprises operating in the Podkarpackie for which financial statements (in EMIS database) for 2018 were available. This sample was used as a research set to assess the risk of bankruptcy (in the 2-year horizon up to 2020) of enterprises in the Podkarpackie region under analysis based on an estimated scoring model using the approach of ensemble classifiers.

Procedure of Mapping PD into Scores
In this study, the score scaling approach discussed in detail in the literature was used (see e.g., Siddiqi 2017, pp. 240-41). The relationship between the score and logarithms for the so-called odds ratio: Odds = 1−PD PD -expressing the ratio of odds: (1-PD)-that the business in question will be classified as healthy versus the odds that the business will be bankrupt (PD) is: Score = a 0 + a 1 · ln(Odds). (1) By introducing the concept of pdo-the number of points in the scoring system which doubles the value of the odds ratio, for a given value of the score we obtain the following relationship: Score + pdo = a 0 + a 1 · ln(2·Odds).
(2) By solving the system of Equations (1) and (2) we obtain formulas of the linear relationship ratios of score scaling depending on ln(Odds), and therefore on the probability of default (PD): (3)

Validation Measures of Bankruptcy Prediction Models
Commonly used measures of classification accuracy were applied in the validation of estimated bankruptcy models. They are described by Siddiqi (2017) and Thomas (2009) clearly and in detail. The confusion matrix is probably the most frequent approach in the assessment of classification accuracy of models. Table 2 presents a general form of the confusion matrix. The GINI coefficient and the related area under curve ROC (Receiver Operating Characteristic) AUC ROC are also often used as measures of bankruptcy model classification effectiveness (see e.g., Agarwal andTaffler 2008, Barboza et al. 2017). The ROC curve is a graphic representation in a coordinate system (Y = Sensitivity, X = (1 − Specificity)) of a relationship of the cumulative percentage (structural ratio) for bankrupt businesses from the contingency table for the predicted ith category of a point score (score i ): ω_sk B,i = i j=1 n B,j n B and the corresponding cumulative structural ratio for businesses at no risk of default: ω_sk NB,i = i j=1 n NB,j n NB . In the case of classification results ordered relative to the score in the contingency table with k different scoring categories, the GINI coefficient, and thus AUC ROC , is determined by the following formula (see e.g., Thomas 2009, pp. 117-18): The GINI coefficient takes values from interval [0,1]. High values of the coefficient, approaching 1, mean that the model being assessed is highly effective (nearly perfect). Meanwhile, the measure of the area under curve AUC ROC ranges from 0.5 to 1. Value 0.5 means that the model classifies businesses in the analyzed classes in a completely random way (i.e., its use is pointless), while 1 is a value attained by the best model which perfectly identifies membership in a class.
Information Value (IV), Kolmogorov-Smirnov (KS) statistics and less frequently, the divergence coefficient (Div) are also used to evaluate the effectiveness of bankruptcy forecasting models at the validation stage. IV is calculated by the following formula (see e.g., Thomas 2009, p. 106): where: n B is the number of bankrupt businesses, n NB is the number of businesses with no risk of bankruptcy, n B,i is the number of businesses for the ith scoring category and n NB,i is the corresponding number of businesses with no risk of bankruptcy. The higher IV values, the better discriminant properties of the model subjected to assessment. The Kolmogorov-Smirnov (KS) statistic compares the empirical distributions of populations containing bankrupt businesses and healthy businesses (a goodness of fit measure). The greater the differences in cumulative distribution functions for the score (higher KS values), the better discriminant capabilities of the model (i.e., the better the scoring model is in separating bankrupt businesses from healthy ones). KS statistic values are calculated by the following formula (see e.g., Thomas 2009, p. 111): The last validation measure applied to the bankruptcy forecasting models assessed is distribution divergence (Div) given by the formula (see e.g., Siddiqi 2017, p. 261): where: µ NB -mean score distribution value for the healthy businesses population, µ B -mean score distribution value for bankrupt businesses population and var NB , var B -respective variances of these distributions.

Optimal Cut-Off Point for Scoring Determination
There are several methods of determination the optimal cut-off point for the scoring models. These methods are described in depth in the literature (see e.g., Zweig and Campbell 1993). One of the methods of determining the optimum cut-off point for the score (used in the research) was to find a score value that maximizes the value of the following expression: where: k B is the cost of type I error: the model incorrectly classifies a bankrupt business as a healthy one, k NB corresponds to the cost of type II error where the model incorrectly classifies a healthy business as bankrupt, and p B is the probability of membership in the bankrupt class estimated on the basis of the training sample (the percentage of bankrupt businesses in the sample).

Research Results
The ensemble classifier methodology will be applied to design a scoring model in order to predict bankruptcy events of Polish businesses operating in the Podkarpackie Voivodeship. Each stage of design will be presented in detail together with its potential for a practical application.
The process of designing a scoring model using ensemble classifiers for businesses operating in the Podkarpackie Voivodeship was divided into several stages:

1.
The choice of a suitable subset of financial ratios (bankruptcy predictors) determining the financial circumstances of the businesses analyzed (feature selection stage).

2.
Training and calibration of base models applied and ensemble models selected on the basis of the training sample. Determining the function of the probability of default and membership in forecast classes for both samples: training sample, and test and validation sample (which is not taken into account at the stage of calibration of the evaluated models).

3.
Determining the score value for the training sample and the test sample with a suitable scaling of the value of the resulting probability of default function for the estimated models and their transformation into corresponding resulting point score values.

4.
Validation of estimated models. Determining the values of validation statistics for the models applied and analysis of their discriminant capabilities for the training sample and the test sample. Selection of the best forecasting model.

5.
For the best model, determining the optimum cut-off point for the score value, i.e., the point below which a business should be categorized as bankrupt. 6.
Bankruptcy forecasts for analyzed businesses from the Podkarpackie Voivodeship in individual sectors of economic activity and business size. Final comparative analysis of results and final conclusions.

Feature Selection Stage-Selection of Ratios/Bankruptcy Risk Determinants
Twenty-two financial ratios commonly applied in financial analysis of businesses were initially proposed for the assessment of the financial standing of analyzed business entities: • Financial liquidity ratios: X1-Current ratio = Current assets to Short-term liabilities total (all liabilities with maturity shorter than one year): CA/STL, X2-Quick ratio = (Current assets − Inventories) to Short-term liabilities total: (CA-I)/STL, X3-Cash ratio = Cash and Cash equivalents to Short-term liabilities total: Cash/STL With the help of wrapper techniques (discussed in Section 4.2 above), an optimum subset of predictors was selected by means of the genetic algorithm and a potentially best set of financial ratios for bankruptcy forecasting models being trained was determined. Linear discriminant analysis (LDA) was used as a forecasting model in the search algorithm, while the general classification accuracy (AC) measure was applied as the measure of the effectiveness of predictor subsets. The calculations were performed by means of the R statistical analyses package and function gafs() from the caret library. Parameters for the genetic algorithm were as follows: poSize = 50-the number of subsets assessed in each iteration, pcrossover = 0.8 (crossover probability) -a high probability that the new generation will not be an exact copy of the chromosomes of parents from the previous generation, pmutation = 0.1 (mutation probability)-a low probability of chromosome alterations in the subsequent mutation, elite = 0-the number of best subsets capable of survival in each generation. By means of a suitable genetic algorithm randomly searching for the best subset of diagnostic variables, a set of 19 optimum financial ratios (accuracy for the set was AC = 0.89) using 5-fold cross-validation (cv) procedure. Table 3 contains values of selected measures of discriminant capabilities and significance for individual diagnostic variables.

Calibration of the Parameters of Bankruptcy Risk Forecast Models (Calibration Stage)
Eight single classifier models were used in forecasting the probability of default (PD) ( Table 3). Classification functions for those models, the so-called level 1 classifiers, served as inputs for a level 2 ensemble meta-classifier, which aggregated them into final classification results. k-NN (k-Nearest Neighbors) was the stacking ensemble classifier. Alternatively, boosting and bagging ensemble classifier approaches were also applied. For comparison purposes, boosting ensemble classifiers were also used: GBM-Stochastic Gradient Boosting Machine (Friedman 2002) and boosted logistic regression classifier (Logit Boost). The Random Forest (RF) model and averaged Neural Networks (avNNet) were used as bagging classifiers (Breiman 2001). A bankruptcy prediction model calibration procedure was based on samples described in detail in Section 4.3. Calculations were performed with the help of procedures written with the use of the R package libraries (https://cran.r-project.org/). In particular, the following libraries were used: caret, caretEnsemble, caTools, pROC, MASS, nnet, kernlab, rpart, earth, mgCV, klaR, gbm, plyr, randomForest and other auxiliary ones. A cross validation approach was employed in the calibration procedure of the optimum model (k = 5-fold CV cross-validation). The approach assumed an area under ROC curve values (AUC ROC ) as a measure of models' discriminant quality (effectiveness).  A table in Appendix A (Table A1) presents the final best configurations of the considered bankruptcy prediction models and optimum values of their parameters.

Determining Score for the Optimum Model (Score Scaling Stage)
Forecast values of classification functions of the models analyzed (probability of default, PD) in the scoring model should be transformed into corresponding values of score through appropriate scaling. In the calculations, it was assumed that for = 600 the number points which doubles the odds that the business is not at risk of default, evaluated as 50:1 (Odds = 50), is pdo = 20. With the above assumptions, scaling parameters were estimated and the score function was described by the following relationship: = 487.12 + 28.85 • ln . Figure 9 illustrates the scaling obtained for the score when the GBM ensemble model is used for the training sample. A table in Appendix A (Table A1) presents the final best configurations of the considered bankruptcy prediction models and optimum values of their parameters.

Determining Score for the Optimum Model (Score Scaling Stage)
Forecast values of classification functions of the models analyzed (probability of default, PD) in the scoring model should be transformed into corresponding values of score through appropriate scaling. In the calculations, it was assumed that for Score 0 = 600 the number points which doubles the odds that the business is not at risk of default, evaluated as 50:1 (Odds = 50), is pdo = 20. With the above assumptions, scaling parameters were estimated and the score function was described by the following relationship: Score = 487.12 + 28.85· ln 1−PD PD . Figure 9 illustrates the scaling obtained for the score when the GBM ensemble model is used for the training sample.

Model Validation (validation Stage)
Figure 10 presents ROC curves for five classification models assessed. It is clear that the GBM model perfectly (in 100% cases) predicted membership of businesses in either class (bankrupt and healthy) (AUC = 1). The worst of the models compared, NB-Naive Bayes, also had high prediction accuracy expressed by measure (AUC = 0.92), although it was still significantly inferior to other models.

Model Validation (validation Stage)
Figure 10 presents ROC curves for five classification models assessed. It is clear that the GBM model perfectly (in 100% cases) predicted membership of businesses in either class (bankrupt and healthy) (AUC = 1). The worst of the models compared, NB-Naive Bayes, also had high prediction accuracy expressed by measure (AUC = 0.92), although it was still significantly inferior to other models.

Optimal Cut_Off Point Determination Stage
The next step for the ensemble GBM classifier-based forecasting model with the best classification properties expressed by the value of validation measures involved determining values of the optimum cut_off point below which the businesses analyzed were regarded as being at risk of default (bankrupt). In the calculations, it was assumed that the ratio of the above costs is k NB k B = 1 2 (double cost for the incorrect classification of bankrupts, as the event appears to be more detrimental for the practical application of the model) and a probability of p B = 0.486 in the training sample was determined. The optimum cut_off point was calculated for score cut_off = 386 by means of formula (8). Therefore, all businesses for which the point value of the score is score ≤ 386 must be forecast as members of the bankruptcy (B) class, while the remaining ones as members of the non-bankruptcy (NB) class. Still, for the estimated optimum ensemble GBM model in the score value interval , there is a very high potential risk of default (PD > 0.5), determined on the basis of the training sample (contained in the interval [0.96-0.51]). Consequently, if we rely on the classical procedure allowing us to consider a business (for which PD > 0.5) bankrupt (at risk of default), then the score interval (387 <= score <= 486) should be defined as a "gray zone", where it is difficult to clearly determine the membership of a given business in either the bankruptcy class or the non-bankruptcy class. Businesses of this type were assessed as uncertain, leaning towards potential bankruptcy (contingent on unfavorable circumstances affecting their financial health). Figure 13 presents an interpretation of the optimum cut-off point for the score, determined in the above manner.

Optimal Cut_Off Point Determination Stage
The next step for the ensemble GBM classifier-based forecasting model with the best classification properties expressed by the value of validation measures involved determining values of the optimum cut_off point below which the businesses analyzed were regarded as being at risk of default (bankrupt). In the calculations, it was assumed that the ratio of the above costs is = (double cost for the incorrect classification of bankrupts, as the event appears to be more detrimental for the practical application of the model) and a probability of p = 0.486 in the training sample was determined. The optimum cut_off point was calculated for score _ = 386 by means of formula (8). Therefore, all businesses for which the point value of the score is score 386 must be forecast as members of the bankruptcy (B) class, while the remaining ones as members of the non-bankruptcy (NB) class. Still, for the estimated optimum ensemble GBM model in the score value interval , there is a very high potential risk of default (PD > 0.5), determined on the basis of the training sample (contained in the interval [0.96-0.51]). Consequently, if we rely on the classical procedure allowing us to consider a business (for which PD > 0.5) bankrupt (at risk of default), then the score interval (387 <= score <= 486) should be defined as a "gray zone", where it is difficult to clearly determine the membership of a given business in either the bankruptcy class or the non-bankruptcy class. Businesses of this type were assessed as uncertain, leaning towards potential bankruptcy (contingent on unfavorable circumstances affecting their financial health). Figure 13 presents an interpretation of the optimum cut-off point for the score, determined in the above manner.

Classification of Enterprises from the Podkarpacie Region (Prediction Stage) Depending on the Risk of Their Bankruptcy
Applying the classification rule: (9) Figure 13. Optimal score cut-off point for the GBM model. Source: own elaboration using Excel.

Classification of Enterprises from the Podkarpacie Region (Prediction Stage) Depending on the Risk of Their Bankruptcy
Applying the classification rule: IF (score ≤ 386) THEN bankrupt within h ≤ 2 years; IF (score > 486) THEN healthy; IF (score > 386 AND score ≤ 486) THEN uncertain (grey zone); (9) a forecast of bankruptcy (membership in either risk class) was determined over a time horizon of maximum 2 years (up to 2020) for businesses operating in the Podkarpackie Voivodeship in various sectors of economic activity and depending on the enterprise size. Table 4 is a contingency table presenting the forecast number of businesses classified as members of each of the 3 bankruptcy risk classes by different economic activity sectors. Table 4. Predicted number of businesses at risk of bankruptcy in time horizon h = 2 (until 2020) and predicted number of businesses in an uncertain condition in the Podkarpackie Voivodeship for various sectors.

Number of Businesses Forecast by the Ensemble Scoring Model in a Given
Bankruptcy Risk Class (h = 2 years, until 2020)

Discussion
The comparative analysis of the classification effectiveness of ensemble models in juxtaposition with several classical bankruptcy forecasting methods indicates that ensemble classifiers are characterized by considerably better values of validation measures, both for the training sample and the test sample, surpassing all of the analyzed base classifiers in terms of accuracy. The best ensemble classifier, GBM (decision trees supported by a stochastic gradient boosting algorithm) offered full accuracy of correctly classified bankrupt and healthy businesses (AC = 100%, AC B = 100%, AC NB = 100%) for the training sample and over 99% for the test sample (Tables A2 and A3 Based on the analysis of the value of the probability of bankruptcy (Figure 14) of the enterprises surveyed in the Podkarpackie Voivodeship in individual sectors of their business activity (estimated on the basis of the best ensemble classifier model-GBM, which has the best forecasting and classification capabilities) and on the basis of an analysis of their predicted belonging to three Bankruptcy risk classes (Table 4), the following comparative analysis can be carried out assessing the exposure to bankruptcy risk of enterprises operating in the region in 2018 in view of their potential bankruptcy by 2020.
In sector A (farming, forestry and fishing) with a total of 50 enterprises surveyed, the developed scoring model predicted bankruptcy within a time horizon of up to two years (up to 2020) 4% of all enterprises in this sector, including uncertain enterprises from the second class of bankruptcy risk (from the so-called "gray zone"), i.e., with a significant probability of bankruptcy (PDt = 2 > 50%), the percentage of potentially bankrupt enterprises (over a 2-year horizon) increases to 10%. The average probability of bankruptcy for enterprises in this sector is 11% (min = 0%, max = 99.9%). Every 10 enterprise in this sector had a probability of bankruptcy over a 2-year horizon (up to 2020) in the range of 43%-99.9%. It is therefore quite heavily exposed to the risk of bankruptcy.
In sector B (mining and extraction) with a total of 12 enterprises, the scoring model qualified all enterprises as not being threatened with bankruptcy. The average probability of bankruptcy for enterprises in this sector is 1% (min = 0%, max = 6.8%). Every one of the 10 enterprises in this sector had a probability of bankruptcy in the 2-year horizon (up to 2020) in the range of 2.3%-6.8%. Therefore, it was the first of the three least risky sectors of the region's economy.
In sector C (industrial processing) with a total of 581 enterprises, the scoring model predicted bankruptcy for 2% of all enterprises in this sector within a time horizon of up to two years (up to 2020), including uncertain enterprises from the second class of bankruptcy risk (from "grey zone"), while the number of potentially bankrupt enterprises increased to 7%. The average probability of bankruptcy for enterprises in this sector is 7.4% (min = 0%, max = 100%). Every enterprise in this sector had a probability of bankruptcy over a 2-year horizon (up to 2020) greater than 17.7%.
Sector D (energy, water, gas and other energy sources) with a total of 25 enterprises was the second of the three least risky sectors in the region's economy. The scoring model qualified all enterprises as not being threatened with bankruptcy. The average probability of bankruptcy for enterprises in this sector is 2.2% (min = 0%, max = 43.3%). Every 10 enterprises in this sector had a probability of bankruptcy in the 2-year horizon (up to 2020) that was greater than 1.7%.
In sector E (waste, wastewater and sewage management) with a total of 67 enterprises, the scoring model qualified 97% of enterprises as not being threatened with bankruptcy, and 3% as uncertain. The average probability of bankruptcy for enterprises in this sector is 3.8% (min = 0%, max = 83.7%). Every 10 enterprises in this sector had a probability of bankruptcy in the 2-year horizon (up to 2020) that was greater than 9.2%.
In F sector (construction) with a total of 220 enterprises, the scoring model predicted bankruptcy within a time horizon of up to two years (up to 2020) for 3% of all enterprises in this sector, though after including uncertain enterprises with the second class of bankruptcy risk (from the "grey zone"), the percentage of potentially bankrupt enterprises increases to 8%. The average probability of bankruptcy for enterprises in this sector is 8.6% (min = 0%, max = 100%). Every 10 enterprises in this sector had a probability of bankruptcy over a 2-year horizon (up to 2020) that was greater than 23.5%.
In sector G (wholesale and retail) with a total of 734 enterprises, the scoring model predicted bankruptcy for 2% of all enterprises in this sector for up to two years (up to 2020). After including uncertain enterprises from the second class of bankruptcy risk (from the "gray zone"), the percentage of potentially bankrupt enterprises rose to 5%. The average probability of bankruptcy for enterprises in this sector is 5.7% (min = 0%, max = 100%). Every 10 enterprises in this sector had a probability of bankruptcy over a 2-year horizon (up to 2020) that was greater than 9.8%.
In the H (transport and storage management) sector with a total of 75 enterprises, the scoring model predicted bankruptcy for 3% of all enterprises in this sector for up to two years (up to 2020), including uncertain enterprises from the second class of bankruptcy risk (from the "gray zone"), while the percentage of potentially bankrupt enterprises increased to 7%. The average probability of bankruptcy for enterprises in this sector is 8.2% (min = 0%, max = 100%). Every 10 enterprises in this sector had a probability of bankruptcy over a 2-year horizon (up to 2020) that was greater than 23.2%.
The I sector (accommodation and gastronomy) with a total of 56 enterprises was the sector most exposed to the risk of bankruptcy. The scoring model predicts bankruptcy in the time horizon of up to two years (up to 2020) for as much as 13% of all enterprises in this sector, including uncertain enterprises in the second class of bankruptcy risk (from the "gray zone"), meaning the percentage of potentially bankrupt enterprises increased to 20%. The average probability of bankruptcy for enterprises in this sector is 22.2% (min = 0%, max = 100%). Every 10 enterprises in this sector had a probability of bankruptcy over a 2-year horizon (up to 2020) that was greater than 98.6%.
In the J (information and communication) sector with a total of 55 enterprises, the scoring model qualified 95% of enterprises as not being threatened with bankruptcy, and 5% as uncertain. The average probability of bankruptcy for enterprises in this sector is 6.1% (min = 0%, max = 89.4%). Every 10 enterprises in this sector had a probability of bankruptcy in the 2-year horizon (up to 2020) greater than 11.6%.
In the K (finance and insurance) sector with a total of 12 enterprises, the scoring model qualified 67% of enterprises as not being threatened with bankruptcy, and 33% as uncertain. The average probability of bankruptcy for enterprises in this sector is 28.2% (min = 0%, max = 96%). Every 10 enterprises in this sector had a probability of bankruptcy in a 2-year horizon (up to 2020) within 95.2-96%. This is a very specific sector (financial sector), hence the ambiguous interpretation of the results of the examined model belonging to risk classes.
In the L sector (services for the property market) with a total of 73 enterprises, the scoring model predicted bankruptcy for 3% of all enterprises in this sector within a 2-year horizon (up to 2020), including uncertain enterprises from the second class of bankruptcy risk (from the "gray zone"), where the percentage of potentially bankrupt enterprises increases to 17%. The average probability of bankruptcy for enterprises in this sector is 15.8% (min = 0%, max = 99.7%). Every 10 enterprises in this sector had a probability of bankruptcy over a 2-year horizon (up to 2020) in the range of 72.1%-99.7%. It is therefore also one of the sectors with high exposure to the risk of bankruptcy.
In the sector M (scientific, specialist and technological activity) with a total of 61 enterprises, the scoring model predicted bankruptcy for 2% of all enterprises in this sector within a time horizon of up to 2 years (up to 2020). After including uncertain enterprises from the second class of bankruptcy risk (from the "gray zone"), the percentage of potentially bankrupt enterprises increased to 5%. The average probability of bankruptcy for enterprises in this sector is 6.9% (min = 0%, max = 98.1%). Every 10 enterprises in this sector had a probability of bankruptcy over a 2-year horizon (up to 2020) greater than 17.4%.
The N sector (administration and support) with a total of 43 enterprises was also one of the sectors with a high exposure to the risk of bankruptcy. The scoring model predicted bankruptcy within a 2-year horizon (up to 2020) for 5% of all enterprises in this sector, including uncertain enterprises from the second class of bankruptcy risk (from the "gray zone"), when the percentage of potentially bankrupt enterprises increases to 12%. The average probability of bankruptcy for enterprises in this sector is 10.3% (min = 0%, max = 99.8%). Every 10 enterprises in this sector had a probability of bankruptcy over a 2-year horizon (up to 2020) greater than 51%.
Sector P (education) with a total of only nine enterprises was the third least risk affected sectors in the region's economy. The scoring model qualified all enterprises as not threatened with bankruptcy. The average probability of bankruptcy for enterprises in this sector is 3% (min = 0%, max = 19.1%). Every 10 enterprises in this sector had a probability of bankruptcy within a 2-year horizon (up to 2020) greater than 19%.
In the Q (health and social care) sector with a total of 38 enterprises, the scoring model predicted bankruptcy within a 2-year horizon (up to 2020) for 3% of all enterprises in this sector, including uncertain enterprises from the second class of bankruptcy risk (from "gray zone"), when the percentage of potentially bankrupt enterprises increases to 8%. The average probability of bankruptcy for enterprises in this sector is 10.3% (min = 0%, max = 98.8%). Every 10 enterprise in this sector had a probability of bankruptcy over a 2-year horizon (up to 2020) greater than 39%.
In the R (entertainment and leisure) sector with a total of 11 enterprises, the scoring model qualified 82% of enterprises as not being threatened with bankruptcy, and much because 18% as uncertain. The average probability of bankruptcy for enterprises in this sector is 11.3% (min = 0%, max = 61%). Every 10 enterprises in this sector had a probability of bankruptcy in the 2-year horizon (up to 2020) in the range of 54.9%-61%. Therefore, it is a sector in which ambiguity in the interpretation of the results of the examined model to risk classes can also be observed.
In the last sector S (other services) with a total of 11 enterprises, the scoring model qualified 91% of enterprises as not threatened with bankruptcy, and 9% as uncertain. The average probability of bankruptcy for enterprises in this sector is 12% (min = 0%, max = 91.7%). Every 10 enterprises in this sector had a probability of bankruptcy in the 2-year horizon (up to 2020) greater than 17.7%. It is also a sector in which ambiguity can be observed in interpreting the belonging of the results of the examined model to risk classes.
Based on the results from Table 4 and based on the analysis of the value of the probable bankruptcy probability (Figure 15) for the surveyed enterprises depending on their size, the following relationships illustrating the degree of their exposure to the risk of bankruptcy can be seen. In the sector for very small (micro) enterprises (535 of which were included in the study), the developed scoring model qualified 89% of these enterprises as not threatened with bankruptcy, 4% as bankrupt and a further 7% as uncertain (from the "gray zone"), but potentially with a significant risk of their bankruptcy above 50%. In the sector of small sized enterprises, of which 356 was developed in the study, the scoring model qualified 91% of such enterprises as not threatened with bankruptcy, 3% as bankrupt and another 6% as uncertain (from the "gray zone"). In the sector of medium enterprises (606 included in the study), the scoring model qualified 96% of enterprises as not threatened with bankruptcy, 2% as bankrupt and another 2% as uncertain (from the "gray zone"). Similarly for the large enterprise sector (636 enterprises) the scoring model in the study classified 94% of enterprises as not at risk of bankruptcy, 2% as bankrupt and another 4% as uncertain (from the "gray zone").
One also should pay attention to limitations of the analyses presented. The limitation of the model developed may be the fact that the developed and implemented scoring model has been estimated on the basis of statistical data for enterprises from various sectors of activity. It is very difficult to develop a model with good accuracy (a sufficiently high classification efficiency) that would be good in such a situation, since various sectors often very specific and incomparable. However, on the other hand, the results obtained (Table 5) for 39 actual bankruptcies of enterprises in the Podkarpackie Voivodeship observed and confirmed in 2018, the efficiency of correct recognition by the scoring model of really bankrupt enterprises is about 79%, while for non-bankrupt enterprises the equivalent figure is 95%. The effectiveness of the scoring model for the separate class: bankrupt at 79% is sufficient and acceptable, but of course can be discussed further. It can show that designed model includes three classes of bankruptcy risk (bankrupt, non-bankrupt and "gray zone"-difficult to say, but potentially also bankrupt). In the classic approach with only two classes (bankrupt, non-bankrupt), one should add another 8% to the model effectiveness (including the class of uncertain enterprises-"gray zone" for which the probability of bankruptcy is high and greater than 0.5). Then the efficiency of the correct classifications of estimated model increases to 87%, which seems to be a good result. Overall accuracy for the model (without division into classes) is 94%.
Also, the selection of such a large set of as many as 19 indicators as determinants of the financial condition of enterprises in the models raises the question of whether it should not be limited to the set of only a few most important indicators. Such a large collection may raise suspicions that many of the variables may be strongly correlated with each other, which may affect the quality, especially of classic models, such as LDA. In the study, such a large set of factors was conditioned by the choice using the wrapper method and genetic algorithm, and the final application of the type of ensemble classifiers that are not so sensitive to the interdependence of variables. However, for the sake of accuracy, it is worth emphasizing that the correlation between variables has never been greater than 0.87. However, in future research, it is worth considering reducing the number of predictors of bankruptcy.

Conclusions
The results of the analyses presented in the paper lead to several general conclusions that can be a summary of the research:

•
The scoring model designed for the early prediction of bankruptcy risk for Polish businesses from the Podkarpackie Voivodeship using ensemble classifiers was highly effective in forecasting and accurately evaluating the risk of default of the analyzed businesses. • An analysis of the forecast is obtained suggests that small enterprises are more exposed to risk of default than medium or large enterprises.

•
The sector of business activity and unique characteristics of the economic activity influences a potentially higher risk of business bankruptcy. A higher number of potential bankruptcies is reported in some sectors of economic activity than in others.

•
A higher risk of business bankruptcy for some particular industry branches may be caused the situation where bankruptcy models are sensitive to enterprises belonging to industry sectors. This can be considered as one of the limitations of the study presented in the paper. A potentially higher risk of business bankruptcy for some particular industry branches can be influenced by the model design. It would have to be examined in further research whether the estimated separate models for each sector would indicate lower values of PD and therefore lower exposure to the risk of bankruptcy of companies.

•
Another limitation of the study is that bankruptcy models are sensitive to the phase of economic cycle (presented model does not cover it), but the influence of economic cycles on bankruptcy risk can be considered in further extensions of research.

•
The approach presented in the paper can be used not only to assess the risk of bankruptcy of enterprises by market analysts and regional analysts, but also in banking activities to assess credit risk for corporate loans, where similar models are of course successfully implemented.

•
The study may be extended in the future with an analysis and an assessment of the risk of bankruptcy for enterprises from other regions of Poland with the development of individual separate ensemble models for enterprises from key sectors of the country's economy. It can also be extended to a comparative analysis of the risk of bankruptcy in given sectors of the economy for a group of countries, e.g., EU, Visegrad Group countries or the Three Seas Initiative countries.
Funding: This research received no external funding.

Conflicts of Interest:
The author declares no conflicts of interest.
Appendix A Table A1. Optimum configuration and set of parameters for bankruptcy models applied.