Markov Chain K-Means Cluster Models and Their Use for Companies’ Credit Quality and Default Probability Estimation

: This research aims to determine the existence of inﬂection points when companies’ credit risk goes from being minimal (Hedge) to being high (Ponzi). We propose an analysis methodology that determines the probability of hedge credits to migrate to speculative and then to Ponzi, through simulations with homogeneous Markov chains and the k-means clustering method to determine thresholds and migration among clusters. To prove this, we used quarterly ﬁnancial data from a sample of 35 public enterprises over the period between 1 July 2006 and 28 March 2020 (companies listed on the USA, Mexico, Brazil, and Chile stock markets). For simplicity, we make the assumption of no revolving credits for the companies and that they face their next payment only with their operating cash ﬂow. We found that Ponzi companies (1) have a 0.79 probability average of default, while speculative ones had (0) 0.28, and hedge companies ( − 1) 0.009, which are the inﬂections point we were looking for. Our work’s main limitation lies in not considering the entities’ behavior when granting credits in altered states (credit relaxation due to credit supply excess).


Introduction
Credit positioning evolution can migrate through stability and instability periods of the institutions that request it. In the first case, loans represent low-risk profitable funds (hedge), which, over time and due to financial risk factor variations, market conditions, and socio-economic issues finally may become speculative and high risk (Ponzi).
It is fundamental to highlight that traditional credit for companies is not safe from the impact produced during crises, and also depends on intrinsic factors such as their level of capitalization, assets, liquidity, and solvency, among others.
In an economy where banks have a close financial relationship with each other, past, present, and future are linked not only to capital assets and workforce but also to financial relationships. This fact was indicated by Minsky [1] in his second theorem of financial instability.
Minsky determines three types of income-debt relations for the firms: hedge, speculative, and Ponzi financing. This research finds transition thresholds between each state proposed by Minsky's hypothesis and classifies the firm's actual state. This research goes far beyond calculating bankruptcy probabilities; it gives actual thresholds to the Minsky hypothesis and may be used to check the industry and financial firm's health.
In this paper, we use the quarterly operations cash flows and the theoretical payment obtained under the assumption of the company paying their long-run debt in ten years, at the companies effective interest rate, (financial cost divided by long-run debt) as the underlying and the strike price of a theoretical call option for each company.
The idea behind this theoretical call is to assess the firm's bankruptcy probability on next quarter in the absence of other credit sources; a stand-alone assumption given by the The methodology to be used comprises simulations with homogeneous Markov chains and a clustering method (k-means) in order to determine changing thresholds for the different company's credit positions; which, if any, will determine a functional proposal to allow classifying other companies, and the feasible regions for each financing type will be determined.
We believe that the paper's main contribution is the novel application of Markov chains to quantify Minsky's hypothesis in any firm. We used a classical mathematical tool to give a quantitative solution to a classic economic problem. We also give a reference for an industry or economy's health to decision-makers.
This methodology allows the analyst to know companies' solvency and compare it with the industry's risk exposition. Regulators or economic decision-makers could measure and compare a big company's financial health and use it as an indicator for possible thirdgeneration crises, i.e., companies that could systematically all default. It is important to stress that we consider that the cash flows are an ergodic stochastic process.
A remarkable fact about credit models is their dependence on normal or logistic distribution assumptions. Models such as the Altman z-score must be specified according to their industry and company's size. According to Minsky's theory of financial market instability, our application prevents distribution dependence assumptions and is a pioneer in Markov chain application for quantitative classification and determination of credit migration probabilities. With that, we present a mathematical application which enriches economic theory.
We also want to point out that the Merton Model only provides default probabilities under the assumption of full payment of debt upon an option's expiration rate (an unlikely assumption due to the need for financial structure requirements), and the Altman's Z score classifies firms on arbitrary groups using a methodology that must be adapted to each industry and firm's size. The need for adaptation in Altman's methodology creates non-comparable groups. Alongside this, the econometric methods rely on the distribution (normal, logistic, or Poisson) and ergodicity assumptions to only provide default probabilities, but the firm's classification is arbitrary. None of them provide an internally coherent classification methodology or provide further analysis.
As possible future investigation lines, we consider models that could modify companies and the industry's default probabilities through Hidden Markov chains where the external factor could be the regulator's decisions or market's volatility.
In Section 2, we review the literature on the methodologies implemented for corporate credit analysis. In Section 3, we introduce the financial sector characteristics. Section 4 presents the methodology, while Section 5 presents the results. Furthermore, in Section 6, we show the research conclusions, recommendations, and limitations.

Literature Review
A significant amount of research exists on transition matrix application to estimate the probability of default due to credit quality impairment on commercial, microcredit, and mortgage portfolios in Latin America; such as that of Aparicio et al. [5] who analyze the credit portfolio of the Peruvian financial system considering the credit transition matrix usage subject to the economic cycle.
For the Colombian study case, Támara-Ayús, Aristizábal, and Velásquez [6] compare discrete and continuous transition matrices. There are also studies with GNP (Gross National Product) as that carried out by McCulloch and Tsay [7] observing that uncertainty about the situation of a given period will depend on the model specifications. By his side, Peña [8] concludes that economic cycles and macroeconomic variables influence credit quality impairment.
Hamilton [9] has proposed default models with Markov chains by using a chain comparable to logarithmic levels and trend constructed from the same time series model. In the case of Nicaragua, Gaitán, and Flores [10] analyze banking institutions' credit portfolio through transition models with Markov chains.
Regarding the Venezuelan case, Porras, Anchundia, and Vieira [11] analyze it through transition matrices for the ex-ante and ex-post periods of the foreign capital influx to determine if the rivalry among financial institutions increases as shown in the Yu et al. [12], Pfeuffer and Reis [13], Leung and Kwok [14] papers.
On the literature review, we found that there are models that join the classification of companies and risk structure and delve into the need for updating measurement models. Córdova, Molina, and Navarrete [15] and García, Bolívar, and Vázquez [16] provide examples of this.
There are applications to assess the risk of individuals such as that of Kavitha [17] who proposes a model based on k-means, which has higher efficiency in accuracy and time compared to the traditional methods. For counterparty credit risk, Zhu, Chan, and Bright [18] apply machine learning techniques to determine and solve problems in XVA and credit profiles, through Monte Carlo simulation and K-means clustering method.
In the credit risk analysis of Small and Medium Enterprises (SMEs)-applied to Bahasa Indonesia-Wahyudin, Djatna, and Kusuma [19] model risk clusters by using K-means and risk measurement by calculating qualitative importance scores combined with sentiment scores. The result shows that the model is adequate for clustering and measuring the risk level.
In the case of Mexico, Ayús, Peña, and Álvarez [20] present an analysis of different studies on the commercial portfolio credit risk. They develop a model to predict the probability of having a debtor's default, through factorial and discriminant analysis.
By their side, Lagunas and Ramírez [21] estimate scenarios to determine the number of operations likely to receive illegal financial transactions through stochastic transition matrices.
Our main work contribution is to propose an algorithm capable of classifying a company's credit health according to its operating flow as a proportion of its financial requirements. To simplify the calculations, we assume that the firm will not invest nor have access to other financing sources; with the correct internal information, the analyst can relax this assumption. In addition, we present a method based on Markov chains to determine the thresholds at which companies move from one credit state to another.

The Financial Sector
The financial sector not only can generate a boom-regarding cycles-by providing a larger credit volume but also can induce depression Ferreira [22]. In the event that the stock system collapsed, there would be a credit reduction, and this would lead to an economic debacle according to Angeles and Ortiz [23].
It is even possible to see that, in 2008, the debt equity ratio (company's indebtedness capacity against equity) for the banks was not primarily altered, coinciding with Rodríguez and Venegas [24]. We show this in Figure 1.
There are studies such as Berger and Udell [25] which show that companies are always in a financial cycle that commences with financing. There is also evidence of intermediary companies that help in the economic cycle as the case of Bencivenga and Smith [26]. Papers such as that of Adrian and Shin's [27] show that financial institutions function pro-cyclically. Furthermore, some authors argue that while recessions accompanied by financial problems are more prolonged and profound, recoveries are slightly shorter and stronger Claessens, Kose, and Terrones [28].
Terreno, Sattler, and Pérez [29] observed that a typical company asks for debt for continuing with the operations or carrying projects out; this is part of companies' life cycle.
In this paper, we propose a variation from Merton's methodology in order to calculate companies' short-term default probability. Throughout this work, we analyze whether the company is capable of covering its financial requirements under two assumptions: (a) the company does not have access to other financing sources and (b) the company will not carry out any investment operation in the period. There are studies such as Berger and Udell [25] which show that companies are always in a financial cycle that commences with financing. There is also evidence of intermediary companies that help in the economic cycle as the case of Bencivenga and Smith [26]. Papers such as that of Adrian and Shin's [27] show that financial institutions function pro-cyclically. Furthermore, some authors argue that while recessions accompanied by financial problems are more prolonged and profound, recoveries are slightly shorter and stronger Claessens, Kose, and Terrones [28].
Terreno, Sattler, and Pérez [29] observed that a typical company asks for debt for continuing with the operations or carrying projects out; this is part of companies' life cycle.
In this paper, we propose a variation from Merton's methodology in order to calculate companies' short-term default probability. Throughout this work, we analyze whether the company is capable of covering its financial requirements under two assumptions: (a) the company does not have access to other financing sources and (b) the company will not carry out any investment operation in the period. Unlike Merton's method, this article focuses on the company's short-term capacity to cover its financial requirements itself only in the following period though the assumption of normality in the innovations.

Methodology
Merton's model for corporate default uses the vanilla options rationale to calculate the default probability. Papers from Tudela and Young [30]; Dar, Anuradha, and Qadir [31]; Afik, Arad and Galil [32] are examples of the use of this method.
The dynamics for V, company's operating cash flow over time is described by a stochastic diffusion process where dz is a Brownian motion: The cash flow must be enough each period to cover at least the payment of the debt, C; this implies that we are interested in the section of the distribution where tt VC  .
Rephrasing this idea, under our assumptions, the shareholders will receive a net cash flow, at the end of the trimester, T, from the firm. Unlike Merton's method, this article focuses on the company's short-term capacity to cover its financial requirements itself only in the following period though the assumption of normality in the innovations.

Methodology
Merton's model for corporate default uses the vanilla options rationale to calculate the default probability. Papers from Tudela and Young [30]; Dar, Anuradha, and Qadir [31]; Afik, Arad and Galil [32] are examples of the use of this method.
The dynamics for V, company's operating cash flow over time is described by a stochastic diffusion process where dz is a Brownian motion: The cash flow must be enough each period to cover at least the payment of the debt, C; this implies that we are interested in the section of the distribution where V t > C t .
Rephrasing this idea, under our assumptions, the shareholders will receive a net cash flow, S T = Max[V T − C, 0] at the end of the trimester, T, from the firm.
Following the Merton's model, at the beginning of the trimester, t, the present value of the expected net cash flow to the shareholders is where σ is the operational cash flow volatility. Therefore, the probability that the company can make the debt payment in the trimester, given the information contained in the sigma field, F, is therefore, the probability of default is given by

Company Grouping or Clustering
The classification algorithms have become increasingly used because of their potential. Areas of knowledge, such as marketing objectives, medical diagnoses, event detection, categorization, and filters. For more details, see Aggarwal [33].
Due to its widely known classification properties, we use in this paper a dendrogram classification, for details in its use see Berkhin [34]. In particular, we use the AGNES algorithm for R ("Agglomerative Nesting", [35][36][37]). We want to emphasize that the algorithm uses a Euclidean distance to classify the objects; this is We also use a k-means algorithm to check the dendrogram grouping. For details on its use, see Hartigan and Wong [37] or Ledolter [38]. For now, it suffices to say that a k-means algorithm divides the n units into k ≤ n different clusters S = {S 1 , S 2 , . . . , S k }, minimizing the within clusters sum of squares until there is no gain in dividing the groups.
Using these methodologies on our data, it is possible to obtain the state transition matrix for the Markov chain; we show it the following section.

Markov Chains
The literature defines a Markov chain as a system that presents changes between states according to a fixed probabilistic rule (transition probabilities), so the process can be in any state with any past combinations and can take a fixed number of future states.
According to Caballero et al. [39]: given (Ω, , P) a probability space and E a nonempty, finite, or countable set, it is said that a random variable succession {X n : Ω → En = 0, 1, . . .} is referred to as Markov chain with space of E states if it satisfies Markov's condition, that is, if ∀n ≥ 0 and for any collection x 0 , x 1 , . . . , x n−1, x, y ∈ E is met.
P(x n+1 = y|X n = x, . . . , In the context of credit risk, the Markov chain is used to model a debtor's credit rating migration. For more details, see Norris [40], Venegas [41], Bolívar-Cimé, Notario, and Pérez [42], and Caballero et al. [39]. In order to compare our results, in Table 1 we show the Fitch's transition matrix for Mexican corporative bonds over the period from 2002 to 2018. In this paper, we find the inflection point between hedge and Ponzi companies, regarding the companies' operating flow and financial indicators; hence, it is more appropriate to use the matrices generated by Markov chains, similarly to Bolívar-Cimé, Notario and Pérez [42].

Results
This research uses quarterly operating flows as well as financial requirements (interest plus estimated amortization) of a sample of 35 companies listed on USA, Mexico, Chile, and Brazil, obtained from "Economatica" over the period from 1 July 2006 to 30 March 2020, representing twelve quarters.
We applied the k-mean methodology to each of the four quarters of financial information. In our analysis, we assume no revolving credits to the company nor the chance of using cash reserves, so they face the next payment only with their operating flow.
Hence, there are differences in the default probability calculation in the proposed methodology compared to those made by the rating agencies. We show the results in Table 2. In order to classify the companies, we will use only their default probability measured with our Merton method variation. Then, we use the AGNES method (Agglomerative Nesting, implemented in R-project) to group them. The AGNES methodology results in the dendrogram showed in Figure 2. Source: Own elaboration with data obtained from Economática In order to classify the companies, we will use only their default probability measured with our Merton method variation. Then, we use the AGNES method (Agglomerative Nesting, implemented in R-project) to group them. The AGNES methodology results in the dendrogram showed in Figure 2.  Although it is not possible to appreciate the cluster to which each company belongs (we show it in Table 3), it is possible to observe the hierarchy.
As a result of Agglomerative Nesting procedure, AGNES, we can find that companies are aggrouped, according to their default probabilities, in 3 clusters. Clustering will help us determine the possible states for the stochastic process (The Minsky's classification). The Markov chains determinate transition probabilities and long-run belonging rate for each state in the transition matrix. Table 3 shows the company grouping results as well as their transition over the periods. We use values from the {1 (Ponzi), 0 (Speculative), −1 (Hedge)} set to show their group, according to their default probability level.
The results show that companies that start in Ponzi (1), generally continue in the same group throughout the different periods; sometimes they may be speculative in any period, but they fail to be a hedge (−1) firm. In the case of ELEKTRA GPO, migration from Ponzi to hedge occurred "jumping" intermediately in speculative periods and hedge.
While most companies that started in the speculative (0) group recovered in the second period, we observed that in some cases after several periods in the hedge classification they either showed impairment or went directly to a Ponzi stage, as in the CEMEXCPO case. Regarding companies that started in a hedge position, the majority stayed there and did show an inflection point when they became speculative before being classified as Ponzi.
We want to emphasize that companies classified as Ponzi belong to the aerospace and beer industries. Speculative companies are focused on commerce, food, and iron production. Hedge companies are focused in several areas.
It is imperative to clarify that the financial requirement coverage analysis eliminates any possible size bias within the sample as it is based on proportion.
Source: Own elaboration with R-project and "cluster" package.
Since a Markov chain is regular, if n→∞, the exponentials P (n) of the P transition matrix converge to a matrix W in a way that all its rows are equal to the same probability vector u = (u x ) x∈E , then, for the case studied u = {0.114285,0.0571428,0.828571} is obtained. We show the transition matrix in Table 4 and Figure 3.    By carrying out 5000 simulations with the transition matrix, we found that a company could go from a hedge classification, −1, to the Ponzi group, 1, in an average of 34 periods, as shown in Figure 4. A significant result obtained from k-means grouping over the default probabilities is that firms in the Ponzi state (1) have an edge probability of default of 0.79, while the speculative (0) ones have 0.28; finally, we found that hedge firms (−1) have an edge default By carrying out 5000 simulations with the transition matrix, we found that a company could go from a hedge classification, −1, to the Ponzi group, 1, in an average of 34 periods, as shown in Figure 4.    By carrying out 5000 simulations with the transition matrix, we found that a company could go from a hedge classification, −1, to the Ponzi group, 1, in an average of 34 periods, as shown in Figure 4. A significant result obtained from k-means grouping over the default probabilities is that firms in the Ponzi state (1) have an edge probability of default of 0.79, while the speculative (0) ones have 0.28; finally, we found that hedge firms (−1) have an edge default probability of 0.009. Therefore, we can confirm that the inflection point exists. Thus, the resulting classification function generated by our model is A significant result obtained from k-means grouping over the default probabilities is that firms in the Ponzi state (1) have an edge probability of default of 0.79, while the speculative (0) ones have 0.28; finally, we found that hedge firms (−1) have an edge default probability of 0.009. Therefore, we can confirm that the inflection point exists. Thus, the resulting classification function generated by our model is We want to point out that transition probabilities and asymptotical states are stable along with the sample and on out of the sample trials. This feature, stability, make the asymptotical states a useful tool to assess the industry's health. Its stability also allows us to give cutting edges to the theoretical Minsky groups, one of our main contributions.
Finally, in Table 5 we compare Z Altman versus our application. As we said before, it is not comparable in their totally but also contains three stages. It is shown that Z Altman have several companies which are in the early bankruptcy stage meanwhile our application says another thing. With our application there is, in the last period, one company (Boeing) classified as a Ponzi. Remember that according to Minsky, a Ponzi enterprise does not have enough operating cash flow to cover interest or debts, the enterprise must acquire more debt to cover those obligations and, in our case, we are supposing no revolving credits and going to face next payment with their cash flow. A big difference between our application and Z Altman is that Z Altman uses ratio analysis and multiple discriminant analysis which need a multivariate normal distribution in continuous variables meanwhile our application uses probability theory (Markov chains), financial mathematics (Merton's model), and data mining (clustering). Source: Own elaboration with R-project and yahoo finance.

Discussion & Conclusions
The paper's main objective is to determine the existence of an inflection point in which the credit risk of a company listed on American Stock Exchanges Prices goes from being minimum (hedge) to very high (Ponzi) by using an adaptation of Merton's default probability methodology, data mining classification tools and Markov chains. We also found transition probabilities for Minsky's classifications and use them to get stationary states that give us an insight into the long-run composition of economic apparatus.
When assessing the default probabilities, compared to existing methods, a determinant feature of this methodology is that our method can be used on any company without regard if it is public or private, previous works are focused on public companies. We also think that is worth to mention that this methodology may be adapted to any probability distribution as long as the researcher can use a European call option; this opens a broad scope of analysis that goes from the closed forms of valuation to the adaptable numerical methods.
The study demonstrated the existence of this inflection point; besides, determined migration thresholds among clusters and a functional proposal to classify firms based on their credit risk.
The Markov chains proved that the number of states determined by the clustering methodology is stable along with the sample because the transition probabilities remain similar in the long-run, giving confidence about the states' long-run distribution and the validity of their derived inferences.
The paper shows a very ductile methodology that does not rely on any distribution assumption or require market data to perform the analysis; it suffices with financial statement information (operating cash flow, long-run debt, and financial expenses).
The methodology also allows us to; naturally, group the firms following a sound economic theory, giving cutting edges to each group. The paper is the first one making such classifications. This unique feature also permits analysis of the behavior of a firm's financial health compared to industry or time. It also permits the decision-makers to analyze the general health of an industry or country (this is a line for future work).
It is essential to mention that the methodology relies on the ergodic assumption to provide trustable long run states. On its current form, the paper does not consider the entities' behavior when granting credits in altered states (credit supply excess or other economic turmoil). This problem may be solved by the inclusion of Poisson jumps (positive or negative) on the Markov chain; we also consider this improvement as a future research line.