Can the SOM Analysis Predict Business Failure Using Capital Structure Theory? Evidence from the Subprime Crisis in Spain

The paper aims to identify which variables related to capital structure theory predict business failure in the Spanish construction sector during the subprime crisis. An artificial neural network (ANN) approach based on Self-Organizing Maps (SOM) is proposed, which allows one to cluster between default and active firms’ groups. The similarities and differences between the main features in each group determine the variables that explain the capacities of failure of the analyzed firms. The network tests whether the factors that explain leverage, such as profitability, growth opportunities, size of the company, risk, asset structure, and age of the firm, can be suitable to predict business failure. The sample is formed by 152 construction firms (76 default and 76 active) in the Spanish market. The results show that the SOM correctly predicts 97.4% of firms in the construction sector and classifies the firms in five groups with clear similarities inside the clusters. The study proves the suitability of the SOM for predicting business bankruptcy situations using variables related to capital structure theory and financial crises.


Introduction
The prediction of business failure receives special attention and growing interest in periods of instability and crisis to prevent bankruptcies. Business failure has been an important research area since the 60 s with Beaver's univariate model [1] and Altman's multivariate model [2]. These failure models compare and classify firms according to quantitative indicators to predict or distinguish between healthy and unhealthy businesses. A wide variety of models and techniques have been applied to predict distress situations and to improve the estimation results, such as logit, probit, artificial neural networks, expert systems, rough sets, hybrid systems, case based reasoning, decision trees, genetic algorithms, clusters, survival models, data envelopment analysis, support vector machines, and fuzzy logic [3][4][5][6][7][8][9][10][11][12][13]). Additionally, some studies detect the factors or causes that generate a firm's problems [14,15]. Moreover, due to the nature of the problem, affected by expert analysts' opinions and a large number of variables, it is suitable for tools taken from the artificial intelligence and big data analysis. Several authors [8,16] detected that using an artificial neural network is the most common technique for bankruptcy prediction because the architecture easily produces good results. Many studies employed the back propagation algorithm for bankruptcy prediction, despite the fact that it

Theoretical Framework
In this study, the selection of the ratios was done taken with reference to a capital structure literature review [19,20]. The vast majority of the studies are empirical applications, and use similar econometric techniques, including ordinary last squares, the difference in differences model, the generalized moments model, logit, probit, tobit, etc. These works compare results and test the main traditional theories of capital structure focusing on characteristics of the firms in each sector. There are only a few applications of non-traditional techniques to study capital structure [19][20][21]. Modigliani and Miller [18] propose that the company's market value is independent of its capital structure under certain assumptions. Then, the same authors in 1963 [22] analyzed the possibility to dismiss the original hypotheses of perfect financial markets and admitted the tax advantage of debt.
Other studies, such as [23,24], analyzed how the extensive use of debt can lead companies to default. As a result, the trade-off theory arose, which considers the effects of the several variables (taxes, bankruptcy costs, and agency problems) and predicts an optimal structure as a result of balancing the costs and the benefits of issuing debt and capital. According to this theory, a company will increase its debt size according to the level at which the tax benefits of debt are offset by the increase in bankruptcy costs.
Then, papers [25,26] proposed the pecking order hypotheses, which recognizes a hierarchical order in financing choices to minimize adverse selection costs. According to this theory, companies prefer, in first place, their internal resources; in second place, external debt; and in third place, they expand their capital through new financial alternatives. This order reflects the preference of entrepreneurs to retain control of the company and avoid the costs of external financing that takes place in context of asymmetric information.
Moreover, the business cycle theory of [27] explained the firms' capital structure. This theory considers that firms can substitute and complement different forms of financing, which vary over its life cycle, and depend on the age, size, and information availability of the companies. Other authors [28] considered that a manager's individual objectives, such as attitude towards debt and experience with external financing (according to risk aversion), level of education, and the socioeconomic and emotional costs of bankruptcy, affect the demand for funds. According to the business life cycle and to this approach, the demand for external funds and its success could be affected by some of the characteristics of the firms and owners. Following these theories, many studies have attempted, without consensus, to determine which describes better the companies financing decisions. For example, the effects of the financial crises over the financing decisions in Spain and Europe are analyzed in papers such as [21,[29][30][31][32], among others.
In financial crisis periods, the alternatives of financing are reduced and the cost of access increased; therefore, the firms probably change their ways of financing and their capital structure. According to [33], companies in financial crises are affected the supply and demand of funds. After the subprime crisis, restrictions come from the supply side.
In summary, the literature review proposed indicators to analyze profitability, growth opportunities, company size, risk, asset structure, and age to evaluate whether these factors that explain leverage can be useful for predicting business failure in the context of a crisis. The selected variables are the following: • Profitability: According to [25,26] the firms prefer to use internal resources to finance operations; in other words, more profitability is linked with cash flows, so the firms no need external debt to finance their investments. Therefore, there is a negative relation between profitability and leverage. However, [22] found the opposite statement, confirming that exists a positive relation because of the most profitable firms can take more debt, given the advantages of taxes. In this study profitability is measured though EBIT (earnings before interest and taxes) over assets taken as references [34][35][36][37], among others. Another measure of profitability is EBITDA (earnings before interest taxes depreciation and amortization) over assets. Thus, the following hypothesis was formulated: H1. Higher Profits (EBIT over Assets) Reduce Default Probability.
• Growth opportunities: On the one hand, the operating earnings' variability and the variation of total assets can be used as measures of growth potential. In the case of earnings, some authors suggest that firms with more variations in sales have more possibilities of growth [21,30,38] and firms with more investments have more potential for growth [21,37,39]. According to the pecking order theory, a positive relationship is expected between leverage and growth opportunities.
One study [40] suggested a positive relationship between the growth and the firm's age because companies need more external financing when they are young. On the other hand, there are other indicators used in the literature to measure growth opportunities, such as market to book ratio and Tobin's Q. Thus, the following statement was formulated:

H2.
A higher Variation in the Total Assets Decreases the Possibility of Bankruptcy.
• Size: According to the trade-off theory, firm size is positively related with external financing, but some empirical evidence shows a negative relationship with leverage. The positive relation is associated with reputation and information asymmetries that are smaller in large firms [40,41]. The size can be measured through the logarithm of total assets [30,34,37,39,[41][42][43] or the logarithm of earnings ( [29,30,35,38], among others). Thus, the following statement was formulated: H3. The Firm's Size Decreases the Probability of Defaulting.
• Risk/Volatility: The pecking order theory and the trade-off theory argue that more risk is associated with less leverage. The risk can be measured through the standard deviation of ROA, that of EBIT, and/or the standard deviation of earnings ( [36,39,42], among others). Thus, the following statement was formulated: H4. More Volatility Increases the Probability of Defaulting.
The literature review showed that companies with high proportions of fixed assets over total assets present more leverage because fixed assets can be used as guaranties. On the other hand, guaranties reduce agency problems and cost of default [35]. Thus, the following statement was formulated: H5. Higher Tangibility (Fixed Assets over Total Assets) Decreases Default Probability.
• Age: A Firm's age can be measured as the logarithm of the number of years of business operation, following [35,44]. It also can be measured by the number of years since the firm's founding [29,38].
On the other hand, authors such as [40] use a size-age index, a linear combination of firm size and age. The pecking order theory predicts a negative relationship between age and debt, arguing that the years of operation of business help the firm to accumulate retained earnings and generate internal resources over time, thereby minimizing the need for external finance. Thus, the following statement was formulated: H6. The Probability of Defaulting is Reduced with the Age of the Firms.

Methodology: Self-Organizing Maps
SOM is a type of ANN which allows one to cluster patrons (in this paper, firms) based on different variables [45]. It is important to mention that a key advantage of this kind of ANN is its unsupervised learning process because it is not necessary to define groups beforehand; rather, they will be defined according to the similarities and differences between the values of all variables included in the study. In this paper, the main purpose is to study how certain variables influence the business failures of Spanish firms without any assumptions; in this sense, the use of other networks, such as back-propagation, is rejected because this technique requires the definition of one or more outputs in conjunction with one or more inputs to find patterns. Thus, it was considered that SOM is the most suitable network for this study.
Regarding the SOM architecture, the neurons of the input layer are connected with all the neurons of the output layer through synaptic weights. Thus, it is possible to establish a bidimensional map, such as failed and non-failed regions (with default and active firms) and create insolvency trajectories [46][47][48]. The SOM learning algorithm can be summarized in the following steps: Each neuron of the input layer is connected to each neuron of the output layer through synaptic weights (w ij ), where w ij corresponds to the weight that connects the neuron i in the input layer with the neuron j. The output layer is a two-dimensional map. The input data are located on the map according to the similarities between all the variables used to define input information. The maximum number or possible iterations in the training phase of the network is defined.
Given an input vector k to the network (where k represents each company), the Euclidean distance (ED), as the measure of distance or similarity, is calculated to find the closest matching unit to each input vector. Initially, t = 0, the weights take small random values.
To reduce the distance between the input vector and the weighting vector associated to the winning neuron, the synaptic weights are updated according to the following rule: where α is the learning ratio; the neighborhood function h(t) allows it to update the winning neuron's weights and its neighborhood (neurons close to the winning neuron at a radio that decreases with the number of iterations). The neighborhood function allows actualizing the weights of the winning neuron and the neighbor neurons to localize similar patterns. The interactive process goes on until t reaches the maximum number of iterations T previously defined.
The preceding step is repeated a sufficient number of times to get the stability of the association between the different patterns with the same unit of the output layer. Once the learning process ends, all the patterns are distributed in the bi-dimensional map. Finally, it is also possible to group the input patterns situated in the map in a particular number of groups. A pattern will belong to the closest group, according to the minimum distance between its characteristics and the mean of the characteristics in the region.
Once the network is implemented, it generates an output map where the firms are located. The ratio of default firms for an output neuron j, Q j , is calculated using the Equation (3): where E j is the total number of firms located in neuron j and A j is the total number of default firms in neuron j. If Q j ≥ 0.5, the cell is considered as a cell of default firms, and Q j < 0.5 if it is considered as a representative of active firms. When the network makes the clusters of firms, two errors can be committed. Error type I: when the model classifies a default firm as active one. Error type II: when the model identifies an active firm as insolvent or the default company.

Data and Variables Selection
The data of construction firms were extracted from SABI database. The examined sector was the construction industry in Spain during 2008. The search has been carried out, dividing between active and insolvent company categories (bankruptcy, preventive arrangement, and suspension of payments). The selection criteria for active firms were met by 117,131 companies and by 1467 over insolvency firms. Then we explored the database to detect firms that changed their state between 2008 and 2011 to detect the incidence of the financial crisis. SABI database includes companies that did not publish complete financial statements, so the selection of observations was further limited, and the study was focused on companies whose financial statements contained at least the data necessary for the calculation of selected ratios. This condition was met by only 76 default companies; therefore, we randomly extracted 76 active firms (see Figure 1). The variables were selected considering the hypothesis described in Section 2 based on the theoretical framework of a financial crisis and capital structure. Table 1 shows the variables selected for the analysis and its description, Table 2 presents the descriptive statistics of the sample, and Table  3 shows the correlation matrix between the analyzed variables.

1,467
Default between 2008 and 2011

228
Firms with complete statements 76 Figure 1. Summary of samples.
The variables were selected considering the hypothesis described in Section 2 based on the theoretical framework of a financial crisis and capital structure. Table 1 shows the variables selected for the analysis and its description, Table 2 presents the descriptive statistics of the sample, and Table 3 shows the correlation matrix between the analyzed variables.  According to the results of the correlation matrix, the variables 5 and 6 show a high correlation, so the variable 6 (earnings) was eliminated as a measure of the sizes of the firms. The other eight variables were selected for the empirical analysis.

Application
When the network is implemented, it generates an output map of dimensions 11 × 6 (11 rows and six columns), where the corresponding patterns (firms) have been numbered, and "D" corresponds to a default company and "A" identifies an active company, both at the end of 2011.
The firms that were most similar were placed in the same area of the map. As can be seen in Figure 2, the groups situated at the top of the map contain the default firms, whereas the groups located at the bottom mainly represent clusters of active firms.    D10  D7  D14  D2  D8  D62  D22  D50  D49  D40  D65  D59  D46  D70  D51  D56  A69  A72  D52  D37  D1  D48  D13  D43  D11  D71  D61  D75  D24  D23  D55  D3  D36 D28 A7  A37  A75  A66  A11  A8  A2  A23  A38  A31  A9  A42  A60  A65  A70  A39  A30  A15  A17  A5  A41  A54  A36  A44  A16  A50  A22  A28  A49  A53  A12  A25  A26  A10  A13  A29  A71  A14  A43  A46  A35  A19  A4  A21  A63  A56  A67  A32  A18  A61  A58  A40  A45  A73 A55 A59 A74 A68 On the other hand, Figure 3 shows the values of Q j of each output layer neuron, defined in Equation (3). The color red is used to identify the neurons with Q j ≥ 0.5; in other words, those contain default firms. The color green is a cell considered representative of an active firm neuron (Q j < 0.5). The white cells represent those units of the map in which no company has been located.   A29  A71  A14  A43  A46  A35  A19  A4  A21  A63  A56  A67  A32  A18  A61  A58  A40  A45  A73 A55 A59 A74 A68 11 4 Figure 3. Value of Q j of each neuron in SOM. Table 4 shows the accuracy percentage of the model, the error percentages of each type, and the resulting percentage error. As can be seen, the SOM correctly predicts 97.4% of firms in the construction sector in Spain. Moreover, the percentage of type I error (1.4%) is less than that of type II error (3.8%). Therefore, the results show the suitability of SOM for classifying business bankruptcy, and it is generally accepted that the cost of incurring type I error is greater than the cost of incurring type II, due to the losses that may be entailed when investing in a company with a high risk of bankruptcy when it has been considered to be active. Then, the second step is to analyze which variables define the bankruptcy in the Spanish construction sector. To do this, it is necessary to evaluate the value of each variable in the map. These values with the color scale are presented in Figure 4. The dark blue color represents the minimum values while the red one shows the highest values. In this case, for variables associated with risk, low values (blue) are expected, whereas for others such as profitability or size-high values (red).
Through SOM (Figure 2) and the characteristics map (Figure 4), similarities were detected between the firms included in the five clusters.
Group 1 is only formed by default firms (30 companies), which shows a high ROA volatility and a middle earnings volatility as measure of risk. These firms also have low profitability and low growth opportunities due to low variability in earnings and assets. Moreover, they are small, young, and have high proportions of fixed assets that are long term. Group 2 is formed by 48 firms, identifying 44 default firms and only four active. This group shows low/middle profitability and low/middle growth opportunities. The earnings volatility is high, and in the case of ROA, as a measure of risk, the value is medium to low. Additionally, these firms are young and small, but, in this case, most of the assets are not fixed. Group 3 is comprised by 32 active firms and only one default firm. This group presents low volatility in ROA and earnings, medium to high profitability, and high opportunities for growth. Additionally, it can be observed that this cluster is characterized by middle aged and big firms, with small fixed assets ratios.
Group 4 contains only active firms (21 companies). These firms are the oldest of the sample, with high profitability. Additionally, they show high growth opportunities and low risk measured by both variables (earnings and ROA), and they have middle/high tangibility. Group 5 contains 20 firms: specifically, 19 active and one default firm. This group is the most different cluster because it is characterized by big and middle aged firms which have high profitability, high growth opportunities, and low risk, measured as the volatility of earnings and ROA. Additionally, most of the assets are fixed. Group 3 is comprised by 32 active firms and only one default firm. This group presents low volatility in ROA and earnings, medium to high profitability, and high opportunities for growth. Additionally, it can be observed that this cluster is characterized by middle aged and big firms, with small fixed assets ratios.
Group 4 contains only active firms (21 companies). These firms are the oldest of the sample, with high profitability. Additionally, they show high growth opportunities and low risk measured by both variables (earnings and ROA), and they have middle/high tangibility.
Group 5 contains 20 firms: specifically, 19 active and one default firm. This group is the most different cluster because it is characterized by big and middle aged firms which have high profitability, high growth opportunities, and low risk, measured as the volatility of earnings and ROA. Additionally, most of the assets are fixed.

Discussion and Conclusions
This paper proposes an enrichment of the theory of capital structure and the prediction of business failure showing the capacity of SOM neural network to classify firms as healthy or unhealthy. In this case, we proved the impact of the financial crisis on firms' financial decisions that could have induced failure. In the net, we considered variables that measure the age, profitability, growth opportunities, size, volatility, and asset tangibility. Those factors taken from the capital structure and financial theory were employed to predict business failure and the financial crisis's impact. The SOM correctly predicts 97.4% of firms (98.6% of the active firms and 96.2% of the default firms) in the analyzed sector with a type I error (1.4%) smaller than type II error (3.8%). Additionally, the net classifies the firms in five groups with clear similarities inside the clusters that confirms the hypotheses. That means that firms with high profits, with more assets, and of older age have lesser probabilities of defaulting. Additionally, the net shows that more risk or volatility increases the probability of defaulting. These characteristics were verified in the five clusters created through the SOM, linking the groups 1 and 2 with the default firms, which show high risk, low to medium profitability, and low growth opportunities. Moreover, they are small and young, but show high (group 1) or low (group 2) fixed assets. Groups 3, 4, and 5 are associated with active or solvent businesses with the opposite characteristics of those expected.
Due to the results exposed above, we can confirm hypotheses 1, 2, 3, 4, and 6. Hypothesis 5 is not clear, because bankruptcy clusters (groups 1 and 2) show opposite values.
According to hypothesis 1: higher profits reduce default probability. These results are in accordance with the capital structure literature, and specifically with the pecking order theory [25,26], showing that companies prefer to use internal sources to finance their operations, so a negative relationship between leverage and profitability is expected ( [32][33][34]39,43,44] among others). Hence, higher profitability implies a greater cash flow generated internally to finance the planned investment, and consequently, a lower need for debt. Additionally, we can add that these funds could be used to go through situations of external financial distress, reducing the probability of defaulting. Hypothesis 2 supposes that a higher variation in total assets decreases the possibility of bankruptcy, when using this ratio as a measure of growth potential. Therefore, we confirm that companies with higher growth opportunities could go through financial problems better than companies with lower opportunities during the subprime crisis. But in the literature, there are arguments for and against the positive relationship [31,32,39,43], whereas there is a negative relationship between leverage and growth [35,49]. The tests of hypothesis 3 shows that firm's size decreases the probability of defaulting because larger companies tend to be more diversified. This situation decreases the company's risk to internal and external shocks and bankruptcy possibilities. In addition, large companies generally have better access to external sources of financing because they have lower problems of asymmetric information, while they tend to be more mature, with greater availability of tangible assets and profitability, so they face lower adjustment costs of leverage [40]. Furthermore, hypothesis 4 is confirmed linking more volatility or risk with the possibility of default. The companies with volatility in their income face greater credit restrictions due to the risk that they will not generate sufficient profits to meet their debt commitments. Finally, hypothesis 6 is verified: the probability of defaulting is reduced with the age of a firm. Older companies tend to be more diversified, with better financial records, better relationships with suppliers, among other factors. The net does not identify clearly the impact of tangibility (hypothesis 5) in the prediction of business failure. The analysis identifies that most of the factors that explain financial decisions in companies are also useful for predicting bankruptcy during crisis situations. This relationship between both theories could be useful in order to set public and banking policies to improve the access to financial resources and for monitoring the firms with higher chances of defaulting.
As a future study, it would be interesting to add another clustering method, such as K-means, as a comparison with SOM and to compare the obtained results. On the other hand, regarding the application, the study could be extended by including other countries, besides Spain; for example, European countries. Moreover, it is possible to test the analysis, including other years or other crises, such as the case of the Euro crises. Finally, other sectors could be studied, both for the subprime crisis in Spain, and for other crises and other countries, to test whether the SOM confirms the same results.
Author Contributions: This article is a joint work of the four authors. J.P.L., L.F.-A., V.S., and H.V. contributed to the research ideas, literature review and analysis and to writing the paper. All authors read and approved the final manuscript.
Funding: Part of this research was funded by the project "Análisis exploratorio y detección de las causas de fracaso empresarial en emprendedores, micro y pequeñas empresas del sudoeste de la Provincia de Buenos Aires [PGI UNS 24/E162].

Conflicts of Interest:
The authors declare no conflict of interest