Failure Prediction in the Condition of Information Asymmetry: Tax Arrears as a Substitute When Financial Ratios Are Outdated

: This paper aims to study the usefulness of applying tax arrears in failure prediction, when annual reports to calculate ﬁnancial ratios are outdated. Three known classiﬁcation methods from the failure prediction literature are applied to the whole population dataset from Estonia, incorporating various tax arrears variables and ﬁnancial ratios. The results indicate that accuracies remarkably exceeding those of models based on ﬁnancial ratios can be obtained with variables portraying the average, maximum, and duration contexts of tax arrears. The main contribution of the current study is that it provides a proof of concept that accounting for the dynamics of payment defaults can lead to useful prediction models in cases in which up-to-date ﬁnancial reports are not available.


Introduction
Since the first multivariate study by Altman (1968), the failure prediction literature has flourished and bibliometric databases indicate the growing popularity of the topic. The extant studies are almost exclusively characterized by the usage of financial ratios as predictors (Balcaen and Ooghe 2006;Veganzones and Severin 2020), but evidence about the usefulness of financial ratios for prediction remains contradictory (Jayasekera 2018). Although numerous sample-based studies have obtained high accuracies, several large population studies have indicated significant classification errors based on the information from the last available annual report (e.g., Altman et al. 2017). These errors can be attributed to a substantial issue that is largely not addressed in relevant papers, namely, information asymmetry, in which firms' insiders (e.g., managers and owners) are more knowledgeable about the financial situation than firms' outsiders (e.g., suppliers, customers, and banks). The latter is especially characteristic of the non-listed micro-, small-, and medium-sized firm (i.e., SME) segment, because their reporting frequency, content of reports, external monitoring by auditors, and market pressures are different from those of large listed firms. Thus, there have been numerous recent attempts in the literature to identify valuable substitutes for financial ratios in failure prediction (Ciampi et al. 2021).
Research about potential substitutes for financial ratios in SME failure prediction is scant. Two domains appear to be more used in relevant studies, namely, payment defaults and corporate governance variables; these are also used in combination (Iwanicz-Drozdowska et al. 2016). Corporate governance variables have mostly provided an increment to the classification accuracies of financial ratios, although their individual value may be low (Ciampi 2015;Süsi and Lukason 2019). In turn, different types of payment default variables have yielded accuracies that often exceed those of financial ratios (Back 2005; Lukason and Andresson 2019; Ciampi et al. 2020). Thus, information about payment defaults can serve as a useful substitute in cases in which annual reports to calculate financial ratios are unavailable.
Therefore, this study aimed to determine the value of using tax arrears as a type of payment default, as a substitute for financial ratios, in failure prediction. To achieve this aim, failure prediction models were constructed using different methods to determine: (a) the comparative accuracies of models based on financial ratios and tax arrears for periods when annual reports are available; and (b) the accuracies of models based on tax arrears for periods when annual reports are not available.
This study provides an important contribution to the extant literature by indicating that tax arrears information leads to high classification accuracies when timely financial reports are unavailable, or the available reports are significantly outdated. In addition, the accuracy of a comparable model based on tax arrears exceeds the precision of a model based on financial ratios calculated from the last available annual report. Thus, when information asymmetry exists because of the unavailability of firms' financial reports, models based on payment defaults can serve as useful substitutes for classic financial ratio-based models.
The next section provides the study's motivation based on the extant literature, with the main focus on information asymmetry and past payment defaults as predictors of permanent insolvency. The latter is accompanied with several hypotheses developed based on the findings from the available literature. These hypotheses focus on the tradeoff between the accuracies of models based on either tax arrears or financial ratios from different pre-failure periods. Then, based on a whole population dataset applied to the relevant Estonian context, the calculated variables and applied methods are described. The subsequent results and discussion section describes the findings and outlines the contribution of the study to the extant literature. The paper ends with a conclusion that, in addition to summarizing the study, also outlines practical implications, future research avenues, and main limitations.

Failure Prediction and Information Asymmetry
It has been established in different countries that non-listed firms (especially micro-, small-, and medium sized firms, i.e., SMEs) can delay the submission of their annual report beyond the legal deadline (Lukason and Camacho-Miñano 2019). When the respective delay is added to the deferred deadline for compulsory submission following the end of the fiscal year, financial information can be substantially outdated for external decision makers. In Estonia, the submission deadline is six months following the end of the fiscal year; when, for example, a delay of six months occurs beyond the respective deadline, the available financial information is two years old. Studies about the SME failure process (see e.g., Argenti 1976;Laitinen 1991;Lukason and Hoffman 2014;Lukason and Laitinen 2019) have indicated that serious financial decline for the majority of such firms occurs in less than two years, and for a large proportion, in less than one year. Thus, the available financial reports often fail to signal problems to external parties, leading to information asymmetry. Thus, insiders of a firm are in a more favorable position than its outsiders.
The literature has established that firms in a poor financial situation may delay more than their healthy counterparts. This is evident from two streams of literature: one focusing on failure prediction (e.g., Altman et al. 2010;Iwanicz-Drozdowska et al. 2016) and the other on reporting delays (e.g., Luypaert et al. 2016;Lukason and Camacho-Miñano 2019). Insolvent firms may not submit annual reports at all (Balcaen and Ooghe 2006;Lukason 2013). In addition, the reliability of annual reports in the usually non-audited SME segment can be questionable (Balcaen and Ooghe 2006;Jayasekera 2018), especially in the case of financial distress (Campa and Camacho-Miñano 2015;Serrano-Cinca et al. 2019). In addition, in the European Union, the accounting rules were eased for micro-firms with the Directive 2013/34/EU (n.d.); thus, information is available about a very limited number of accounts. The latter precludes the possibility of calculating most detailed financial ratios, leaving multiple essential facts undisclosed, e.g., the liquidity of current assets or the term structure of current liabilities. Thus, the assumption usually made in failure prediction studies, that timely financial statements providing a true and fair view are available, may often not hold in the case of failing SMEs.
At present, the area of failure prediction remains dominated by the use of financial ratios (Veganzones and Severin 2020). Despite the significant limitations of this approach, financial ratios are still the most popular predictors in the SME segment (Ciampi et al. 2021). The accuracies of financial ratio-based models in the SME segment remain modest. For instance, the revalidation of an historic model by Altman et al. (2017) based on all European countries resulted in the general area under the curve (AUC) remaining less than 0.8, with country-specific results showing greater variance. In the Estonian context, similar accuracies for financial ratios have been obtained (Lukason and Andresson 2019;Grünberg and Lukason 2014). The Estonian evidence also indicates, that for SMEs going through bankruptcy proceedings and having submitted their annual reports correctly, poor financial performance may be observable only in the last available annual report (Laitinen et al. 2014;Lukason 2018). Nonetheless, even if such a report signals poor financial performance, it usually becomes available six months following the end of the fiscal year, however the majority of creditors are likely to have made their financing decisions by that time.

Payment Defaults in Failure Prediction
Payment defaults have been infrequently used in failure prediction studies (Lukason and Andresson 2019). This may be due to different causes. First, there is an organic interconnection between payment defaults and the definition of failure. For instance, when failure is defined as a payment default, then either earlier payment defaults of the same kind or different types of payment defaults of any kind can be used as predictors. Thus, the dependent and independent variables can have a degree of overlap. Second, in the case in which failure is defined as permanent insolvency (bankruptcy), the first-time default may quickly develop into permanent insolvency, and thus, may not have significant predictive value. Third, information about payment defaults must usually be obtained from specific databases, which are often not publicly available. Table 1 provides an overview of the available studies using payment defaults as predictors of failure. The extant literature is mainly European-based. Most studies listed in Table 1 apply either large samples or whole populations, therefore enhancing the reliability of the findings. Studies presented in Table 1 are based on various types of payment defaults as predictors, such as defaults on trade credit (e.g., Ciampi et al. 2020), bank loans (e.g., Ciampi et al. 2020), or taxes (e.g., Lukason and Andresson 2019); on multiple occasions, the payment default type was not precisely disclosed. Although the number of variables used varies from one to ten, it can be synthesized based on Table 1 that one or several of the following are considered: the presence of payment defaults in a binary form, the number of payment defaults, the duration of payment defaults (often already incorporated into the definition of the independent variable, rather than providing the number of days), and the size of payment defaults. In several studies (e.g., Ciampi et al. 2020;Altman et al. 2020), the ratio of default value and some financial variable has been applied.
The general finding from the studies listed in Table 1 is that payment default variables are highly useful in failure prediction. The prediction accuracy from their usage often exceeds that of financial ratios or other non-financial variables. At a minimum, they provide an increment to the accuracy of financial ratio-based prediction models. In the short-term horizon, prediction accuracies of over 90% have been documented with models including payment defaults, and some studies have also confirmed their usefulness in the long-run.

Hypotheses with Rationale
Based on the literature review, we formulate three hypotheses listed as follows: Hypothesis 1. Payment defaults from a comparable period lead to a higher accuracy in failure prediction than financial ratios from the last available annual report.
Hypothesis 2. Payment defaults from a comparable period do not lead to a higher accuracy in failure prediction than financial ratios from the one but last available annual report.
Hypothesis 3. Payment defaults from periods not covered by annual report information show remarkably higher prediction accuracies than financial ratios from the last available annual report.
Hypothesis 1 relies on the idea that, in the SME segment, the failure processes of firms are very quick and the last available annual report therefore does not signal any financial problems. Therefore, the accuracy of a financial ratio-based model is modest. Although we do not expect a high accuracy from a comparable payment default-based model, we nonetheless postulate that for numerous firms signals of financial problems start to emerge, and therefore, the accuracy of the defaults model exceeds that of the comparable financial ratio-based model. Hypothesis 2 is related to the former by assuming that, although the accuracy of a financial ratio-based model is modest, because payment defaults of firms are rare, the accuracy of a payment default-based model should therefore be even lower. Hypothesis 3 extends into the period when comparative financial information from annual reports is not available, and assumes that payment problems emerging from the last comparative period quickly escalate into serious payment difficulties. The period covered by Hypothesis 3 has been referred to as a "death struggle" phase in Hambrick and D'Aveni (1988), and other failure process theories have noted that this phase is often characterized as one in which firms linger and are not liquidated early enough (D'Aveni 1989; Lukason and Laitinen 2019).

Population of Firms
This study applied the whole population data of Estonia. The insolvent firms' population consisted of 358 tax liable firms that were dissolved in the period 2015-2017 as a result of not submitting compulsory annual reports to the business register. None of the insolvent firms went through official insolvency proceedings. Thus, the presence of unpaid debt was confirmed at the date of dissolving for all of these firms, therefore making them de facto insolvent. Official insolvency proceedings were not started for these firms mainly because of the lack of interest by creditors, i.e., they considered that the satisfaction of their claims would be highly unlikely or they already tried to satisfy them by other means. All insolvent firms were deleted by the business register because they did not submit annual reports. Thus, at some point in time, the latest financial information available became obsolete and did not portray the financial reality. Conversely, all 45,041 live tax liable firms (i.e., those having "active" status in the business registry) were used. All firms in the analysis are liable to pay value added tax, because otherwise they usually cannot have tax arrears. The average firm in the analysis is a micro-firm (i.e., having turnover below EUR 2 million), which corresponds to the distribution of firms in most countries.

Data Sources
In the case of insolvent firms, the last (noted as period T) and one but last (noted as period T − 1) available annual reports were used. As noted previously, however, these annual reports are significantly outdated. The actual years corresponding to T in the analysis were 2012 and 2013. Tax arrears information (i.e., all unpaid taxes due by firms) is available for all month-end periods throughout the firm's history. Because the exact date of each firm's annual report submission is known, the period T resembles twelve month-end periods until the annual report's submission month. Therefore, T − 1 resembles a period 13-24 months before the last report's submission month. In the case of tax arrears, information is also available after the last annual report's submission date. Therefore, two additional periods, T + 1 and T + 2, were used, which respectively portray 1-12 months and 13-24 months after the last annual report's submission month. Usually, the deletion of a firm due to not submitting annual reports occurs between 25 and 48 months. This approach overcomes the information asymmetry of not knowing the up-to-date financial information from the annual report needed to calculate the financial ratios used as an input for failure prediction models. Information for non-insolvent firms was obtained from the same years as for the insolvent firms. In addition, to avoid year-specific biases, for each non-insolvent firm, both years (i.e., 2012 and 2013) were used as T. As a result, the number of observations in the group of non-insolvent firms was roughly doubled (N = 80,471).

Variables
The variables applied in this study are documented in Table 2. Four classical financial ratios inherent to the Altman (1968) model were calculated. These ratios portray profitability (ROA), solvency (TETA), liquidity (WCTA), and efficiency (TRTA) domains. Because this study mainly relates to micro-firms, the use of highly detailed ratios was not possible due to simplified accounting standards. Moreover, the applied ratios have been proven to be valuable predictors in previous studies, including in Estonia (see e.g., Lukason and Andresson 2019).

Code Formula
Financial ratios ROA net income/total assets TETA total equity/total assets WCTA (current assets-current liabilities)/total assets TRTA turnover/total assets Tax arrears variables MAXT maximum value of tax arrears in the sequence of twelve month ends MEDT median value of tax arrears in the sequence of twelve month ends DURT number of month ends with tax arrears in the sequence of twelve month ends MAXTadj maximum value of the ratio of tax arrears to total assets in the sequence of twelve month ends MEDTadj median value of the ratio of tax arrears to total assets in the sequence of twelve month ends Unlike in the case of financial ratios, there are no rules available about how to calculate tax arrears variables. As noted in the Introduction, few existing studies have used other types of payment defaults, and only a single study specifically applied tax arrears for bankruptcy prediction. This study relied on the variables applied in the study by Lukason and Andresson (2019), and three variables were calculated to examine the evolution of tax arrears. These variables portray the average (MEDT), maximum (MAXT), and duration (DURT) contexts of the tax arrears. Because previous studies do not provide any guidelines, to determine the minimum value of tax arrears in the month-end period that provides the highest prediction accuracy, we applied three different thresholds in the analysis, i.e., >0, >100, >1000 euros. Thus, in the case of >100 and >1000 euro thresholds, the lower values were replaced with zeros in the respective time series. This robustness check was necessary because firms may intentionally withhold or forget to pay an outstanding debt of a very small size. Because firms belonging to different size categories may be affected differently by tax arrears of the same size, we also introduced a variable in which the ratio of tax arrears to total assets was used, rather than applying the original value of tax arrears. This ratio can be calculated for the maximum and median tax arrears, but is not applicable in the case of the duration variable. The relevant ratios calculated do not account for the thresholds outlined earlier. The use of three types of variables, i.e., financial ratios, tax variables, and tax variables combined with balance sheet information, provides a clear understanding about whether tax information is valuable separately or it should be combined with available accounting information.

Methods
We applied three methods, namely logistic regression (LR), neural networks (NNs), and the decision tree (DT), to construct the failure prediction models, because these are widely used in the failure prediction research (see e.g., Shi and Li 2019). The former is a classical statistical analysis tool used for the current purpose, whereas the latter two are machine learning tools that usually yield high prediction accuracies. As noted in Prusak (2018), in the few available failure prediction studies conducted on the example of Estonia, state-of-the-art prediction tools were used. Thus, the current paper follows the existing research direction. In addition, the application of three methods enabled the avoidance of bias that may arise from the use of a single method. Because the dataset is severely imbalanced, we used the synthetic minority oversampling technique (SMOTE) to equalize the samples. This is achieved by repeating the minority group's observations as long as their population equals the majority group's frequency. This approach has usually been applied to avoid these methods to prefer the majority group over the minority group in predictions (Altman et al. 2017;Lukason and Andresson 2019). Prediction models were constructed for T − 1 and T periods using all of these methods for financial ratios and tax arrears variables separately, whereas for periods T + 1 and T + 2 only tax arrears variables were used (because information about financial ratios was unavailable). In the case of tax arrears models, the robustness was checked for the three thresholds described in Section 3.3. Regarding the whole population of firms, we did not apply a hold-out sample in this study, and the population was equally divided between training and testing sets for NN and DT to avoid overestimation with the training set. In the results section, the classification outcomes from the test set are presented. The neural network was applied with standardized variables, two hidden layers, and the sum of squares as the error function. The decision tree CRT algorithm was applied with maximum tree depth of 5, minimum cases in parent node of 100, and minimum cases in child node of 50. Financial ratios or tax arrears variables can be highly correlated (see Appendix A Table A1; the highest variance inflation factor value obtained was 26). Thus, like in most failure prediction studies (see e.g., du Jardin 2017; Altman et al. 2020) only the accuracies of models are presented.

Results
The descriptive statistics outlined in Table 3 indicate that, due to the substantial differences in their median and mean values, tax arrears variables have the potential to discriminate between insolvent and non-insolvent firms. This is confirmed with the constructed LR, NN, and DT models. Moreover, the conducted statistical tests also indicate significant differences in the values of financial ratios for insolvent and non-insolvent firms (see Table 3 notes). The accuracy based on financial ratios from the last annual report is surpassed by tax arrears information from the same period T, for all methods applied (see Table 4). However, neither information source yields high classification accuracies, with the maximum being 74.5% from DT. This is expected because the period T portrays financial situation roughly 3-4 years before the dissolving of an insolvent firm. In period T − 1, the accuracy of financial ratios is (slightly) higher than for tax arrears in the case of all methods; thus, tax arrears information does not have strong potential for long-horizon insolvency prediction. However, the period T accuracies correspond well to those obtained in other long-horizon bankruptcy prediction studies in which financial ratios were implemented (see e.g., du Jardin 2017; Altman et al. 2020). Thus the population applied in this study should also be comparable to other countries. Note: NI-non-insolvent, I-insolvent, T-total accuracy.
For the periods in which financial information is outdated (i.e., T + 1 and T + 2), the accuracies of tax arrears-based models are very high, exceeding even 90%. This result surpasses the accuracies obtained in most European countries in the SME segment with financial ratio-based models (see e.g., Laitinen and Suvas 2013;Altman et al. 2017). Thus, the context of temporary payment difficulties represented by tax arrears is a valuable replacement in failure forecasting in the case in which annual reports are not available for the calculation of financial ratios. Different thresholds used to calculate the tax arrears variables indicate that the best performance is obtained with the base approach (i.e., when tax arrears larger than zero are accounted for), whereas using larger thresholds resulted in lower accuracies. This finding may be due to the mostly micro-firm population applied, i.e., these firms cannot have very large tax arrears and, with the increase in the size of unpaid debt, the likelihood increases that an insolvent firm will be subject to (quick) legal insolvency proceedings, rather than being simply deleted from the business registry.
Concerning different prediction methods, Table 4 indicates that DT leads to the most accurate predictions, followed by NN and LR. However, the differences in accuracies for the three applied methods are not substantial. This can be explained by looking at the accuracies by group (non-insolvent and insolvent firms) in Table 4. The use of different methods shows the accuracy for the non-insolvent firms from T − 1 to T + 2 undergoes a slow but steady rise. In turn, the accuracy for insolvent firms undergoes a jump from T to T + 1. This means that, among non-insolvent firms, there is a constant share of firms with small and infrequent tax arrears, whereas for a large proportion these tax arrears disappear. At time T, a small share of future insolvent firms has tax arrears, whereas, as noted, the picture dramatically changes at period T + 1. Because the latter is accompanied with a rise in the median and mean value of tax arrears for insolvent firms, it is expected that more non-insolvent firms with tax arrears are classified correctly.
We also checked whether the usage of MAXT and MEDT in the form of ratios to firm's total assets (available from the last annual report) increases prediction accuracies (i.e., variables MAXTadj and MEDTadj). In the case of two methods (LR, NN), a slight improvement was obtained, whereas with DT the opposite occurred. Because DT is, by a small margin, the most precise method, it can be concluded that tax arrears are suitable for insolvency prediction purposes in their original form, i.e., they do not need to be applied as ratios. This finding also implies that the presence alone of temporary payment defaults, rather than their magnitude expressed as a ratio to the firm's assets, is relevant. Thus, we can hypothesize that the size of temporary payment defaults may be irrelevant, i.e., their presence alone may be equally troubling for smaller and larger firms.

Discussion
Evidence was found for all three hypotheses in the empirical analysis. Concerning Hypothesis 1, it was confirmed that prediction models based on tax arrears are more useful than financial ratios calculated from the last available report, while there are no substantial differences in the accuracies, which for both types of variables were modest. Similarly modest accuracies were witnessed for both types of variables in the case of the validation of Hypothesis 2, which stated that financial ratios calculated from the one but last available annual report are more accurate than tax arrears. Finally, Hypothesis 3 was strongly validated, because prediction models based on tax arrears from periods for which the annual reports were missing led to high accuracies, even exceeding 90%.
This study leads to several valuable scientific implications. The delay or non-submission of annual reports is typical in the SME segment, especially for those SMEs that are (temporarily) insolvent (Altman et al. 2010;Luypaert et al. 2016;Lukason and Camacho-Miñano 2019). The latter results in the availability of outdated information for the calculation of financial ratios. This study proved that the dynamic application of payment defaults by means of tax arrears time series can be used to overcome the noted information asymmetry problem, resulting in acceptable prediction accuracies. In addition, the scant available literature about payment defaults as predictors of insolvency has mainly taken a stationary view (e.g., Back 2005;Iwanicz-Drozdowska et al. 2016;Ciampi et al. 2020), namely, by only checking for the presence of defaults rather than their evolution over a longer time horizon. This study clearly indicates the value of dynamically using payment defaults (also by accounting for their frequency and severity). Thus, future research can be directed to more systematically studying the phenomenon, e.g., using similar variables in prediction models focusing on the firm's financial evolution (see e.g., du Jardin 2017; Korol 2020).
This study shows that after the emergence of serious financial difficulties, leading to permanent insolvency, a large proportion of firms engage in hiding financial information. The results of this study are consistent with several prominent approaches in the accounting literature, e.g., the obfuscation and selective disclosure theories noted in Lukason and Camacho-Miñano (2019). The results also confirm the prevalence of a quick failure process in the case of SMEs (see Lukason and Laitinen 2019); that is, future insolvent firms cannot be successfully distinguished from their non-insolvent counterparts using the last annual report. The latter argument is challenged by the question of what the financial situation would be if the reporting followed the legal deadlines. When accounting for the rapid growth of tax arrears, it can be assumed that a notable reduction in financial performance should have been occurred, at least for a proportion of the analyzed firms.

Conclusions
Based on the whole population dataset from Estonia, this study showed that variables based on tax arrears can lead to high accuracies in the prediction of insolvency, in the circumstances in which annual reports needed to calculate financial ratios are missing or obsolete. Such circumstances occur when firms in financial difficulties violate their disclosure obligations, leading to the unavailability of up-to-date financial information, and poor prediction accuracies based on the last available annual report. The substitute proposed in the current study to overcome this information asymmetry, namely, the usage of variables based on tax arrears information, can enhance the prediction accuracy by up to 25 percentage points. This provides a possible solution to a call in the extant literature (see e.g., Ciampi et al. 2021) to find better predictors than financial ratios to foresee firm failure.
The current study provides multiple practical implications. When annual reports are delayed beyond the legal deadline, risk assessment systems can be developed to automatically switch to prediction models based on tax arrears rather than using financial ratios. Second, because insolvency prediction models applying tax arrears indicate reasonably high prediction accuracies, other types of payment defaults (e.g., to banks and suppliers) may also be valuable predictors. Combined models may be particularly applicable because tax regulations differ among countries, leading to potentially varying utility of variables based on tax arrears. Third, this study indicates that tax arrears include valuable predictive information presented in a simple format, i.e., high accuracies can be obtained with unsophisticated approaches (e.g., by accounting for the size and duration of tax arrears), and tax arrears in the form of a ratio to the firm's assets have similarly high relevance. Finally, regulatory bodies may account for the fact that non-submission of reports and payment difficulties are related. Thus, the economic environment would benefit from somewhat stricter rules, in cases in which these have been too liberal thus far.
Future studies can be developed in multiple different directions. First, the current study may be limited by not accounting for potentially different financial profiles of companies. Thus, similar predictive studies in the future would benefit from a more elaborate methodology, e.g., by accounting for firm profiles in respect to tax payment dynamics, which may enable more precise targeting of the time of failure. Second, more theory-driven studies focusing on the firm failure process, particularly during the later stages, can benefit from tax arrears or other payment default information. This would enable the decline to be captured more vividly than via the use of financial reports composed with a one year step in the SME segment. Third, an important limitation of the current study to be addressed in future research is the single country context. Tax regulations and their implementation can vary among countries (Hanlon and Heitzman 2010), and the effect of tax avoidance on the quality of relevant time series may also vary in different environments (Wang et al. 2019;Batrancea and Nichita 2015). Finally, a valuable addition to the available dataset would be the actions of creditors to satisfy their claims, because this would enable a better understanding of when a firm is left dormant.