Financial Distress Prediction and Feature Selection in Multiple Periods by Lassoing Unconstrained Distributed Lag Non-linear Models

: In this paper, we propose a new framework of a ﬁnancial early warning system through combining the unconstrained distributed lag model (DLM) and widely used ﬁnancial distress prediction models such as the logistic model and the support vector machine (SVM) for the purpose of improving the performance of an early warning system for listed companies in China. We introduce simultaneously the 3~5-period-lagged ﬁnancial ratios and macroeconomic factors in the consecutive time windows t − 3, t − 4 and t − 5 to the prediction models; thus, the inﬂuence of the early continued changes within and outside the company on its ﬁnancial condition is detected. Further, by introducing lasso penalty into the logistic-distributed lag and SVM-distributed lag frameworks, we implement feature selection and exclude the potentially redundant factors, considering that an original long list of accounting ratios is used in the ﬁnancial distress prediction context. We conduct a series of comparison analyses to test the predicting performance of the models proposed by this study. The results show that our models outperform logistic, SVM, decision tree and neural network (NN) models in a single time window, which implies that the models incorporating indicator data in multiple time windows convey more information in terms of ﬁnancial distress prediction when compared with the existing singe time window models.


Introduction
Over the last four decades, models and methods for the prediction of corporate financial distress have attracted considerable interest among academics as well as practitioners. Financial distress prediction models can be used for many purposes including: monitoring of the solvency of regulated companies, assessment of loan default risk and the pricing of bonds, credit derivatives, and other securities exposed to credit risk (see [1][2][3][4]).
Different countries have different accounting procedures and rules; thus, the definition of financial distress put forward by different scholars is not always the same (see [4][5][6][7]). Bankruptcy is one of the most commonly used outcomes of financial distress of a company [5]. The nature of a bankrupt firm is that the owners can abandon the firm and transfer ownership to the debt holders, and bankruptcy occurs whenever the realized cash flow is less than the debt obligations [8]. It is generally agreed on that financial failure leads to substantive weakening of profitability of the company over time, but it is also feasible that a financially distressed firm may not change its formal status to bankrupt [9]. Therefore, in this paper, as done by [3] and [4], we identify a financially distressed company as the one at risk of failing, but which remains a viable entity at the present time. More specifically, "special treatment" (ST) is used to measure the financial distress status of a listed company. Further, in this paper, we provide a group of financial distress prediction models that incorporate the panel data of financial and macroeconomic indicators to implement early financial distress prediction.
A few extant studies have predicted financial distress by using the accounting ratios from one or more years prior to its observation, given that early change in financial indicators may provide the warning sign of deterioration of financial conditions. The authors of [1] provide the evidence that bankruptcy can be predicted two years prior to the event, while those of [4] construct respectively two groups of financial distress prediction models for two periods: one year and two years before the observation of the financial distress event. They found that the models in the both time-windows have good predictive performance. Other similar examples can be found in [7,20,22].
In the relevant literatures, financial indicators in the different time windows have proven their contribution to the performance of the distress prediction models, in spite of the fact that the degree of their impact tends to change over time. However, the procedures are performed for all the different lag periods separately, i.e., only using information of one specific year prior to the date of the distress event.
To the best of the authors' knowledge, no previous study, which solves the listed companies' financial distress prediction problem, takes into account the impacts of relevant indicators in the different and consecutive lag periods.
In this study, we take the form of the distributed lag model (DLM), in addition to classical classification techniques, into the financial distress prediction problem and propose a group of distress prediction models including the logistic-distributed lag model and the SVM-distributed lag model that can be treated as generalized distributed lag model. We construct the linkage between multiple lagged values of financial ratios and macroeconomic indicators and current financial status in order to capture the dynamic natures of the relevant data. Further, we propose to implement the penalized logistic-distributed lag financial distress model with the least absolute shrinkage and selection operator (lasso) penalty via the algorithm framework of the alternating direction method of multipliers (ADMM) that yields the global optimum for convex and the non-smooth optimization problem. Lasso-type penalty was applied for three purposes: to avoid the collinearity problem in applying the distributed lag models directly, to simultaneously select significant variables and estimate parameters, and to address the problem of over-fitting. We conduct a series of empirical studies to illustrate the application of our distributed lag financial distress models, including a comparison of predictive performances of the two distributed lag financial distress models proposed in this paper as well as the comparisons of predictive performances of our models with a group of widely used classification models in the different time windows. The results show that all distributed lag financial distress models aggregating the data in three consecutive time windows outperform the ones that incorporate the data in any single-period: 3 years, 4 years or 5 years before the observation year of financial distress, when the financial and macroeconomic factors are included. This paper may provide a means of improving the predictive performance of financial distress model by incorporating data of financial and macroeconomic indicators in consecutive and multiple periods before the observation of financial distress.
The rest of this paper is organized as follows. Section 2 briefly reviews the previous financial distress prediction literature and distributed lag modeling. Section 3 constructs a group of generalized distributed lag models composed of lagged explanatory variables and l 1 regularization, the logistic regression-distributed lag model and the lasso-SVM model with lags, and proposes the ADMM algorithm framework for coefficient estimations and variable selection at same time. Section 4 provides a description of the data. Section 5 presents the empirical results and compares the predictive performance of our models reflecting the extended lag effects of used indicators with the existing financial distress prediction models. Section 6 concludes the paper.

Financial Factors and Variable Selection
There is a large amount of theoretical and empirical microeconomic literature pointing to the importance of financial indicators on financial distress forecasting. The authors of [1] selected five financial indicators of strong predictive ability from the initial set of 22 financial indicators using stepwise discriminant analysis: earnings before interest and taxes/total assets, retained earnings/total assets, working capital/total assets, market value equity/book value of total debt and sales/total assets, which measures the productivity of assets of a firm. The other studies that concern the similar accounting ratios in financial distress prediction can also be found in [23,24]. Furthermore, the current-liabilities-to-current-assets ratio used to measure liquidity (see [22,25]) and the total-liabilities-to-total-assets ratio used to measure the degree of indebtedness of a firm (see [22,25,26]) and cash flow (see [18]) have been incorporated into the distress prediction models because of their predictive performance. The authors of [20] considered nine financial indicators and use them to predict the regulatory financial distress in Brazilian electricity distributors. The authors of [7] introduced 31 financial indicators and found that the most important financial indicators may be related to net profit, earnings before income tax, cash flow and net assets. Along this line, with machine learning methods developing, more diversified financial indicators have been considered in very recent studies of Hosaka [27], Korol et al. [28], and Gregova et al. [29].
Another very important recent research line in the area of financial distress prediction is the suitability problem of these financial ratios used as explanatory variables. For example, Kovacova et al. [30] discussed the dependence between explanatory variables and the Visegrad Group (V4) and found that enterprises of each country in V4 prefer different explanatory variables. Kliestik [31] chose eleven explanatory financial variables and proposed a bankruptcy prediction model based on local law in Slovakia and business aspects.
In this paper, we construct an original financial dataset including 43 financial ratios. The ratios are selected on the basis of their popularity in the previous literature (see [1,7,27]) and potential relevancy to this study. Then, like many relevant studies [32][33][34], we use the lasso method and conduct feature selection in order to exclude the potentially redundant factors.

Macroeconomic Conditions
Macroeconomic conditions are relevant for the business environment in which firms are operating; thus, the deterioration of macroeconomic conditions may induce the occurrence of the financial distress. Macroeconomic variables have been found to impact corporate default and bankruptcy risk significantly, and good examples can be found in [35][36][37][38][39]. In the aspect of financial distress of listed companies, [4] consider the macroeconomic indicators of the retail price index and the short-term bill rate adjusted for inflation in addition to the accounting variables. The results in their studies suggest that all macroeconomic indicators have significant impact on the likelihood of a firm's financial distress. In this paper, we control for macroeconomic conditions, GDP growth, inflation, unemployment rate in the urban area and consumption level growth over the sample period. GDP growth is widely understood to be an important variable to measure economic strength and prosperity, and the increase in GDP growth may decrease the likelihood of distress. The authors of [22,40] have pointed out that the decline in GDP is significantly linked to the tightening of a firm's financial conditions, especially during the financial crisis period. The unemployment rate and inflation are two broadly used measures of overall health of the economy. High unemployment and high inflation that reflect a weaker economy may increase the likelihood of financial distress. Their impacts on financial distress have been examined in [16,37]. Different from the existing relevant studies that consider the lagged effect of macroeconomic variables only in a fixed window, such as 3 years prior to financial distress [16], this paper imposes a distributed lag structure of macroeconomic data, in addition to financial ratio data, and considers the lagged effects of the factors in the multi-periods. Particular attention is devoted to the lag structure and whether the predicting performance can be improved after introducing a series of lagged macroeconomic variables. The theoretical and empirical investigations in this study may complement the literature on financial distress prediction concerned with applying dynamic macro and financial data.

Related Literature on Chinese Listed Companies
The Chinese stock market has grown to over 55 trillion in market capitalization as of February 2020, and the number of listed firms has surpassed 3200, becoming the world's second largest market. Its ongoing development and the parallel evolution of regulations have made China's stock market an important subject for mainstream research in financial economics [41]. In April 1998, the Shanghai and Shenzhen stock exchanges implemented a special treatment (ST) system for stock transactions of listed companies with abnormal financial conditions or other abnormal conditions. According to the regulations, there are three main reasons for designation of a ST company: (1) a listed company has negative net profits for two consecutive years; (2) the shareholders' equity of the company is lower than the registered capital; (3) a firm's operations have stopped and there is no hope of restoring operations in the next 3 months due to natural disasters, serious accidents, or lawsuits and arbitration [7]. ST status is then usually applied as a proxy of financial distress (e.g., [7,16,[42][43][44][45]).
Researchers regard the topic of financial distress prediction of Chinese listed companies as data-mining tasks, and use data mining, machine learning or statistical methods to construct a series of prediction models incorporating financial data (see [7,42,43]) or financial plus macroeconomic data [16] in one-time-period, but not in multiple periods of time. In the very recent study of [45], the authors proposed a financial distress forecast model combined with multi-period forecast results. First, with the commonly used classifiers such as the support vector machine (SVM), decision tree (DT) etc., the two to five-year-ahead financial distress forecast models are established one by one and denoted as T-2 to T-5 models, respectively. Then, through combining the forecast results of these one-time-period models, the multi-period forecast results, as a weighted average over a fixed window, with exponentially declining weights, are provided. This is obviously different from our model, as we introduce the multi-period lagged explanatory variables and detect simultaneously the effects of the variables in different prior periods on financial distress in the process of modeling.

Distributed Lag Models
Sometimes the effect of an explanatory variable on a specific outcome, such as the changes in mortality risk, is not limited to the period when it is observed, but it is delayed in time [46,47]. This introduces the problem of modeling the relationship between a future outcome and a sequence of lags of explanatory variables, specifying the distribution of the effects at different times before the outcome. Among the various methods that have been proposed to deal with delayed effects, as a major econometric approach, distributed lag models (DLMs) have been used to diverse research fields including assessing the distributed lag effects of air pollutants on children's health [48], hospital admission scheduling [49], and economical and financial time series analysis [50,51].
DLMs model the response Y t , observed at time t in terms of past values of the independent variable X, and have a general representation given by where Y t is the response at time t and g is a monotonic link function; the functions s l denote a smoothed relationship between the explanatory vector x t−l and the parameter vector α l ; α 0 and ε t denote the intercept term and error term with a zero mean and a constant variance σ 2 ; l, L are the lag number and the maximum lag. The form that link function g takes presents a distributed lag linear model or a non-linear model. For example, linear g plus the continuous variable Y t present distributed lag linear models, while the logit function g plus the binary variable Y t present a distributed lag non-linear model. In model (1a), the parametric function s l is applied to model the shape of the lag structure, usually polynomials (see [49,52]) or less often regression splines [53] or more complicated smoothing techniques of penalized splines within generalized additive models [47,48]. In fact, the introduction of s l is originally used to solve the problem that these successive past observations may regard as collinear. If L, the number of relevant values of X, is small, as may well be the case for some problems if annual data are involved, then model (1a) degrades to an unconstrained distributed lag model given by the following general representation In (1b), the definitions of variables are the same as those in (1). Correspondingly, the coefficients in model (1b) can be estimated directly by pooled least squares for the linear case or pooled maximum likelihood for the non-linear case, e.g., logit link function g under the assumption that x t−l is strictly exogenous [54].
In this paper, logistic regression with an unconstrained distributed lag structure is used to identify the relationship between financial and macroeconomic indicators and future outcome of financial distress. The logistic regression may be the most frequently used technique in the financial distress prediction field ( [4,20]) because logistic regression relies on fewer assumptions due to the absence of the need for multivariate normality and homogeneity in the variance-covariance matrices of the explanatory variables [23]. Further, a lasso penalty is introduced to conduct simultaneous parameter estimation and variable selection, considering that the lasso penalty method has good performance for solving the overfitting problem caused by the introduction of factors in adjacent windows and selecting the features and the corresponding exposures with relatively significant influence on the response. In fact, the lasso method has been applied for linear-distributed lag modelling (e.g., [55]).

Methodology
In this section, by combining the logistic regression method and unconstrained distributed lag model, we seek to estimate which indicators and in which period prior to the distress event best predicts financial distress. First, we construct Model 1 that represents the "accounting-only" model and incorporates the financial ratios. We introduce the 3-period-lagged financial ratios as independent variables into Model 1 and use the model to predict the financial distress event in year t by using the data of relevant indicator of the consecutive years, t − 3, t − 4, t − 5, simultaneously. Note that t refers to the current year in this paper. Then, we construct Model 2, which represents the 'accounting plus macroeconomic indicators' model, and includes, in addition to the accounting variables, 3-period-lagged macroeconomic indicators. Then, we introduce lasso penalty to the models and implement the coefficient estimation and feature selection. Further, we provide the algorithm framework of alternating direction method of multipliers (ADMM) that yields the global optimum for convex and the non-smooth optimization problem to obtain the optimal estimation for the coefficients. Finally, we propose a support vector machine model that includes the lagged variables of the accounting ratios and macroeconomic factors. This model is used for comparison of the predictive performance of the logistic model with a distributed lag of variables.

The Logistic Regression-Distributed Lag Model with Accounting Ratios Only
The logistic regression may be the most frequently used technique in the financial distress prediction field and has been widely recognized ( [11,20]). We propose a logistic model composed of lagged explanatory variables. Similar to the distributed lag linear model, the model has the following general form: In (2), Y i,t is a binary variable, and if Y i,t =1, then it means that firm I at time t is a financially distressed company, otherwise, firm i (i = 1, 2, . . . , n) is a financially healthy company, corresponding the case of Y i,t = 0; α 0 is intercept, and α t−l = (α t−l,1 , α t−l,2 , . . . , α t−l,p ) T is the coefficient vector for the explanatory variable vector X i,t−l at time t−l; X i,t−l is the p-dimension accounting ratio vector for firm i at time t − l, l = 0, 1, 2, . . . , L; l, L are the lag number and the maximum lag; t 0 is the beginning of the observation period and d is the duration of observation. The idea in (2) is that the likelihood of occurrence of the financial distress at time t for a listed company may depend on X measured not only in the current time t, but also in the previous time windows t − 1 through t − L.
In Formula (2), we assume a five-year effect and set the maximum lag L = 5, given that (1) the effect of the explanatory variable on the response variable may decline to zero in the time series data scenario; (2) the considered length of lag is not more than 5 years in most of the previous studies of financial distress prediction (see [1,7,37]). Besides, we set directly the coefficient α t−0 , α t−1 , α t−2 for the variables in the current year and the previous two year before ST to be 0 vectors, because (1) the financial statement in the current year (year t) is not available for financial distressed companies labeled as in financial distress in year t, since the financial statement is published at the end of the year, but special treatment probably occurs before the publication; (2) designation of an ST company depends on the financial and operating situations of the previous year before the label of ST. Put simply, it is not meaningful to forecast ST risk 0, 1 or 2 years ahead (see [7,45]). Therefore, the logistic model containing the 3~5-period-lagged financial indicators, defined as Model 1, is presented as follows: In Equation (3), Y i,t is binary response and is defined the same in (2); X i,t−3 , X i,t−4 and X i,t−5 are the p-dimensional financial indicator vectors of firm i observed in year t − 3, t − 4 and t − 5; α 0 , α t−3 , α t−4 , α t−5 are intercept terms and the coefficient vectors for the explanatory vectors X i,t−3 , X i,t−4 and X i,t−5 , respectively, and α t−l (l = 3, 4, 5) stands for the average effect of increasing by one unit in X i,t−l on the log of the odd of the financial distress event holding others constants. Of course, in Model 1, we consider the effect of changes in financial ratios on financial distress probability during three consecutive years (t − 3, t − 4, t − 5).

The Logistic Regression-Distributed Lag Model with Accounting Plus Macroeconomic Variables
We further add the macro-economic factors into Equation (3) to detect the influence of macroeconomic conditions, in addition to financial indicators. Model 2, including both accounting variables and macroeconomic variables, takes the following form: In Equation (4), Z t−l (l = 3, 4, 5) represents the m-dimensional macroeconomic factor vector of year t − l; η t−l (l = 3, 4, 5) is the coefficient vector for Z i,t−l ; the others are defined as in Equation (3). Similarly, η j,t−3 + η j,t−4 + η j,t−4 represent the cumulative effects on log odd of the distress event of the j-th (j = 1, 2, . . . , m) macroeconomic factor. Models 1 and 2, marked as Equations (3) and (4), can reflect the continued influence of the financial statement and macroeconomic conditions for multi-periods on the response; however, a considerable amount of potentially helpful financial ratios, macroeconomic factors and their lags may bring redundant information, thus decreasing the models' forecast performances. In the following section, we implement feature selection by introducing lasso penalty into the financial distress forecast logistic models. Further, we provide an ADMM algorithm framework to obtain the optimal estimation for the coefficients.

The Lasso-Logistic Regression-Distributed Lag Model
There is currently much discussion about the lasso method. Lasso, as an l 1 -norm penalization approach, has been actively studied. In particular, lasso has been used on the distributed lag linear model, and lasso estimators for coefficients are obtained through minimizing the residual sum of squares and the l 1 -norm of coefficients simultaneously (e.g., [55]). For the logistic model with lagged financial variables (3), we can extend to logistic-lasso as follows in Equation (5): andα 0 andα denote the maximum likelihood estimations for intercept α 0 and coefficient vector α; f denotes the minus log-likelihood function of Model 1 and can be regarded as the loss function of the observations; α = (α t−3 T , α t−4 T , α t−5 T ) T are the unknown coefficients for explanatory variables; are known training observations and defined as above; λ is the turning parameter; · 1 denotes l 1 -norm of a vector, i.e., the addition of absolute values of each element of a vector; t 0 and d are defined as before; n is the number of observed company samples. Introducing the auxiliary variable β ∈ R 3p , the lasso-logistic model (5) can be explicitly rewritten as follows: min In this paper, we solve the optimization problem (6) by using alternating direction method of multipliers (ADMM) algorithm that was first introduced by [56]. ADMM is a simple but powerful algorithm and can be viewed as an attempt to blend the benefits of dual decomposition [57] and augmented Lagrangian methods for constrained optimization [58]. Now, the ADMM algorithm becomes a benchmark first-order solver, especially for convex and non-smooth minimization models with separable objective functions (see [59,60]), thus, it is applicable for the problem (6).
The augmented Lagrangian function of the optimization problem (6) can be defined as where L ρ is the Lagrange function; θ is a Lagrange multiplier vector and ρ(>0) is an augmented Lagrange multiplier variable. In this paper, ρ is predetermined to be 1 for simplicity. Then, the iterative scheme of ADMM for the optimization problem (6) reads as In (8a)-(8c), α 0 k+1 , α k+1 , β k+1 , and θ k are the values of α 0 , α, β, and θ the k-th iterative step of the ADMM algorithm, respectively. Further, the ADMM scheme (8a)-(8c) can be specified as The sub-problem in (9a), that is, the convex and smooth optimization problem, can be fast solved by the Newton method [61], after setting the initial θ, β to be arbitrary constants. More specifically, let α * k+1 = (α 0 k+1 ; α k+1 ) and α * k+1 be calculated via the following process: , ∇l ∈ R 3p+1 are the hessian matrix and the derivative of differentiable function l with respect to α * , respectively. For sub-problem (9b), its solution is analytically given by where β r k+1 , α r k+1 and θ r k are the r-th components of β k+1 , α r k+1 and θ k , respectively, for the k-th iterative step and r = 1, 2, . . . , 3p. The choice of tuning parameters is important. In this study, we find an optimal tuning parameter λ by the 10-fold cross validation method. We then compare the forecast accuracy of each method based on the mean area under the curve (MAUC) given as follows: where AUC j (λ) denotes the area under the receiver operating characteristic (ROC) curve on j-th validation set for each tuning parameter λ. So far, the lasso estimators for the logistic model (5) including 3~5-period-lagged financial ratios have been obtained by following the above procedures. For the convenience of readers, we summarize the whole optimization procedures in training the lasso-logistic with lagged variables and describe them in Algorithm 1.

End for 10. End while.
For the logistic model (4) with lag variables of the financial ratio and macroeconomic indicators, we can also extend the lasso as follows in (13): whereγ = (α,η) is the lasso estimator vector for coefficients of lagged financial ratios and macroeconomic indicators; γ = (α T , η T ) T represents the unknown coefficients for explanatory variables; α, η = (η t−3 T , η t−4 T , η t−5 T ) T and the others are defined as in Equations (4) and (5). The lasso estimator for model (13) can also be found by using the ADMM algorithm presented above.

The Lasso-SVM Model with Lags for Comparison
The support vector machine (SVM) is a widely used linear classifier with high interpretability. In this sub-section, we construct a lasso-SVM model that includes the 3-period-lagged financial indicators for comparison with the lasso-logistic-distributed lag model. The SVM formulation combing the original soft-margin SVM model [62] and a 3~5-period-lagged financial ratio variable vector is as follows: In (14), α 0 (intercept) and α = (α t−3 ; α t−4 ; α t−5 ) (normal vector) are the unknown coefficients of hyper-plane f (X it ) = α 0 + α T X it ; · 2 denotes l 2 -norm of a vector; C is the penalty parameter and a predetermined positive value; ξ i,t is the unknown slack variable; Y i,t is a binary variable and Y i,t = 1, when firm i is a financially disIressed company in year t, otherwise Y i,t = −1; X it = (1; X i,t−3 ; X i,t−4 ; X i,t−5 ) denotes the observation vector of 3~5-period-lagged financial indicators for firm i; n represents the number of observations; t 0 and d denote the beginning and length of the observation period, respectively.
By introducing the hinge loss function, the optimization problem (14) has the equivalent form as follows [63]: where α * = (α 0 ; α), [·] + indicates the positive part, i.e., [x] + = max{x,0}, and the turning parameter λ = 1/2C. Considering that it is regularized by l 2 -norm, the SVM forces all nonzero coefficient estimates, which leads to the problem of its inability to select significant features. Thus, to prevent the influence of noise features, we replace l 2 -norm in the optimization problem (15) with l 1 -norm, which is able to simultaneously conduct feature selection and classification. Furthermore, for computational convenience, we replace the hinge loss function in (15) with the form of the sum of square, and present the optimization problem combining the SVM model and the lasso method (l 1 regularization) as follows: In (16),α * is the optimal estimated value for the coefficients of the SVM model, and the others are defined as above. Similarly with the process of the solution to the problem (5) as presented previously, first by introducing an auxiliary variable β ∈ R 3p+1 , the lasso-SVM model (16) can be explicitly rewritten as follows: Then, the augmented Lagrangian function of the optimization problem (17) can be accordingly specified as where θ ∈ R 3p+1 and ρ ∈ R are the Lagrange and the augmented Lagrange multipliers, respectively. Then, the iterative scheme of ADMM for the optimization problem (18) is similar with (8a)-(8c) and can be accordingly specified as The finite Armijo-Newton algorithm [61] is applied for solving the α-sub-problem (19a), which is a convex piecewise quadratic optimization problem. Its objective function is first-order differentiable but not twice-differentiable with respect to α * , which precludes the use of a regular Newton method. F(α * ) is the objective function of the sub-optimization problem (19a) and its gradient and generalized Hessian matrix are presented as follows Equations (20) and (21): where I ∈ R 3p+1 is identity matrix and diag(1 − Y i,t α * T X it ) * is a diagonal matrix in that the j-th (j = 1, 2, . . . , 3p + 1) diagonal entry is a sub-gradient of the step function (·) + as The whole optimization procedure applied to solve the α-sub-problem (19a) is described in Algorithm 2.

Algorithm 2.
A finite Armijo-Newton algorithm for the sub-problem (19a). 1 : δ is the parameter associated with finite Armijo Newton algorithm and between 0 and 1.
Choose δ 1 = 0.4 and find stepsize τ The finite Armijo-Newton algorithm can guarantee the unique global minimum solution in a finite number of iterations. The details of proof of the global convergence of the sequence to the unique solution can be found in [61]. For the sub-problem (19b), its solution can be also analytically given by (11) presented above, after replacing α, β and θ with α * , β * and θ * .
So far, the lasso estimators for the SVM model (16), including 3~5-period-lagged financial ratios, have been obtained by following the above procedures. For the convenience of readers, we summarize the whole optimization procedures in training the lasso-SVM with lagged variables and describe them in Algorithm 3. It is worth to note that the estimators for the lasso-SVM model that contain 3~5-period-lagged financial ratios and macro-economic indicators can be also obtained by the following algorithm similarly. Algorithm 3. An ADMM algorithm framework for lasso-support vector machine (SVM) with lagged variables (16) Require:

Sample Description
The data used in the study are limited to manufacturing corporations. The manufacturing sector plays an important role in contributing to the economic growth of a country, especially a developing country [64]. According to the data released by the State Statistical Bureau of China, manufacturing accounts for 30% of the country's GDP. China's manufacturing sector has the largest number of listed companies as well as the largest number of ST companies each year. On the other hand, according to the data disclosed by the China Banking Regulatory Commission, in the Chinese manufacturing sector, the non-performing loan ratio has been increasing. For example, there was a jump in the non-performing loan ratio from 3.81% in December of 2017 to 6.5% in June of 2018. Therefore, it is quite important to establish an effective early warning system aiming to assess financial stress and prevent potential financial fraud of a listed manufacturing company for market participants, including investors, creditors and regulators.
In this paper, we selected 234 listed manufacturing companies from the Wind database. Among these, 117 companies are financially healthy and 117 are financially distressed, i.e., the companies being labeled as "special treatment". The samples were selected from 2007 to 2017, since the Ministry of Finance of the People's Republic of China issued the new "Accounting Standards for Business Enterprises" (new guidelines), which required that all listed companies be fully implemented from January 1, 2007. Similar to [7], [16] and [45], all 117 financially distressed companies receive ST due to negative net profit for two consecutive years. There were respectively 10, 9, 17, 24, 26 and 31 companies labeled as ST or *ST in each year from 2012 to 2017. The same number of financially healthy companies were selected in each year. Considering the regulatory requirement and qualified data of listed companies, our data sample enforces the use of 2007 (t 0 ) as the earliest estimation window available in forecasting a listed company's financial distress. Meanwhile, the maximum order lag used in our models is as long as 5 (years); that is, the maximum horizon is 5 years, so the number of special-treated (ST) companies was counted since 2012 (t 0 + 5). Furthermore, we divided the whole sample group into two groups: the training sample and the testing sample. The training sample is from 2012 to 2016, includes the data of 172 companies and is used to construct the models and estimate the coefficients. Correspondingly, the testing sample is from 2017, includes the data of 62 companies and is used to evaluate the predicting performance of the models.

Covariate
In this paper, we use the factors measured in consecutive time windows t − 3, t − 4 and t − 5 to predict a listed company's financial status at time t (t = 2012, 2013, . . . , 2017). Therefore, we define response y as whether a Chinese manufacturing listed company was labeled as "special treatment" by China Securities Regulatory

Firm-Idiosyncratic Financial Indicator
An original list of 43 potentially helpful ratios is compiled for prediction and provided in Table 1 because of the large number of financial ratios found to be significant indicators of corporate problems in past studies. These indicators are classified into five categories, including solvency, operational capability, profitability, structural soundness and business development and capital expansion capacity. All variables used for calculation of financial ratios are obtained from the balance sheet, income statements or cash flow statements of the listing companies. These financial data for financially distressed companies are collected in year 3, 4 and 5 before the companies receive the ST label.

Macroeconomic Indicator
Besides considering three consecutive period-lagged financial ratios for the prediction of financial distress of Chinese listed manufacturing companies, we also investigated the associations between macro-economic conditions and the possibility of falling into financial distress of these companies. The macro-economic factors include GDP growth, inflation, unemployment rate in urban areas and consumption level growth, as described in Table 2. GDP growth is widely understood to be an important variable to measure economic strength and prosperity; the increase in GDP growth may decrease the likelihood of distress. High inflation and high unemployment that reflect a weaker economy may increase the likelihood of financial distress. Consumption level growth reflects the change in consumption level and its increase may reduce the likelihood of financial distress.  Growth in the Chinese real gross domestic product (GDP) compared to the corresponding period of previous year (GDP growth is documented yearly and by province).

Inflation rate (%)
Percentage changes in urban consumer price compared to the corresponding period of the previous year (inflation rate is documented regionally).

Unemployment rate (%)
The data derived from the Labor Force Survey (population between 16 years old and retirement age, unemployment rate is documented yearly and regionally). In the following empirical part, Model 2 represents the "accounting plus macroeconomic indicators" model and includes, in addition to the accounting variables, 3-period-lagged macroeconomic indicators. We collected the corresponding macroeconomic data in each year from 2007 to 2012 for all 234 company samples and the raw macroeconomic data are from the database of the Chinese National Bureau of Statistics.

Data Processing
The results in the existing studies suggest that the predicting models of standardized data yield better results in general [65]. Therefore, before the construction of the models, a standardization processing is implemented based on the following linear transformations: In formula (24), z ij (t) denotes the standardized value of the j-th macro-economic factor in year t; v ij (t) denotes the original value of the j-th indicator of the i-th company in year t, where j = 1, 2, 3, 4, i = 1, 2, . . . , 234, and t = 2007, 2008, . . . , 2012. It is worth noting that the assignment to v ij (t) for each company is based on the data of the macroeconomic condition of the province where the company operates (registration location).

Empirical Results and Discussion
In this chapter, we establish a financial earning prediction system for Chinese listed manufacturing companies by using two groups of lasso-generalized distributed lag models, i.e., a logistic model and an SVM model including 3~5-period-lagged explanatory variables, and implement financial distress prediction and feature selection simultaneously. For the selected sample set, the sample data from 2007 to 2016 were used as the training sample and the sample from 2017 as the test sample. The tuning parameter was identified from cross-validation in the training set, and the performance of the chosen method was evaluated on the testing set by the area under the receiver operating characteristics curve (AUC), G-mean and Kolmogorov-Smirnov (KS) statistics.

Preparatory Work
It is necessary to choose a suitable value for the tuning parameter λ that controls the trade-off of the bias and variance. As mentioned before, 10-fold cross-validation is used on the training dataset in order to obtain the optimal tuning parameter, λ. First, we compare prediction performance of the lasso-logistic-distributed lag model (5)

Analyses of Results
This study develops a group of ex-ante models for estimating financial distress likelihood in the time window of t to test the contribution of financial ratios and macroeconomic indicators in the consecutive time windows of t-3, t-4 and t-5. In the followings, Table 3 presents the results from lassologistic-distributed lag (LLDL) regressions of the financial distress indicator on the predictor variables and Table 4 presents the results from the lasso-SVM-distributed lag model. Furthermore, we compare predictive performance of the existing widely used ex-ante models, including neural networks (NN), decision trees (DT), SVM, and logistic models estimated in a time period from t-3 to t-5 with our models. The comparative results are shown in Table 5, Table 6 as well as Figure 2.

The Results of the Accounting-Only Model and Analyses
In Table 3 Generally speaking, the two kinds of models yield the best performance when λ = 1. Therefore, in the following, we fit and evaluate the lasso-logistic-distributed lag models by using the tuning parameter of 1.

Analyses of Results
This study develops a group of ex-ante models for estimating financial distress likelihood in the time window of t to test the contribution of financial ratios and macroeconomic indicators in the consecutive time windows of t − 3, t − 4 and t − 5. In the followings, Table 3 presents the results from lasso-logistic-distributed lag (LLDL) regressions of the financial distress indicator on the predictor variables and Table 4 presents the results from the lasso-SVM-distributed lag model. Furthermore, we compare predictive performance of the existing widely used ex-ante models, including neural networks (NN), decision trees (DT), SVM, and logistic models estimated in a time period from t − 3 to t − 5 with our models. The comparative results are shown in Table 5, Table 6 as well as Figure 2. Table 3. The indicator selection and the estimates for lasso-logistic-distributed lag models.

Discussion
Logistic regression and multivariate discriminant methods should be the most popular statistical techniques used in financial distress risk prediction modelling for different countries' enterprise, e.g.,  Table 5. Prediction results of the neural network (NN), decision tree (DT), lasso-SVM and lasso-logistic in the single year time window versus the lasso-SVM-distributed lag (LSVMDL) and lasso-logistic-distributed lag (LLDL) models (financial ratios only).  Table 6. Prediction results of NN, DT, lasso-SVM and lasso-logistic models in the single year time window versus the lasso-SVM-distributed lag (LSVMDL) model and the lasso-logistic-distributed lag (LLDL) model (financial ratios plus macroeconomic indicators).

The Results of the Accounting-Only Model and Analyses
In Table 3, Model 1 represents the "accounting-only" lasso-logistic-distributed lag (LLDL) regression model including the 43 financial statement ratios in 3 adjacent years; the results of financial indicator selection and the estimations for the coefficients are listed in the first three columns. By using Algorithm 1, 23 indicators are in total chosen from the original indicator set. More specifically, two indicators, i.e., indicator number 1 and 2, are selected from the solvency category, five indicators (number 3 to 7) are selected from the operational capability category; six indicators (8-13) from operational capability, eight indicators (13-21) from profitability and two indicators (21-23) from structural soundness and business development and capital expansion capacity. It also can be found that nine financial indicators, namely, sales revenue/average total assets(1), impairment losses/sales profit(2), sales cost/average net inventory(3), shareholders' equity/net profit(4), net profit/total profit(5), net cash flow from operating activities/total assets(6), main business profit/net income from main business (7), net profit attributable to shareholders of the parent company/net profit (8) and operating capital/total assets (9), not used in the paper of [7] have quite significant influence on the future financial distress risk.
The potentially helpful ratios, such as the leverage ratio (total liabilities/total assets), shareholders' equity/net profit (ROE), net profit/average total assets (ROA), current liabilities/total liabilities etc., have significant effects on the occurrence of financial distress of Chinese listed manufacturing companies. For example, as shown in Table 3, the indicator of the leverage ratio in year t − 3-a very early time period-is selected as a significant predictor, and the estimated value for the coefficient is 3.1671. This implies that the increase in value of the Leverage ratio in the fifth previous ST year increases the financial risk of the listed manufacturing companies. The indicator of ROA for year t − 4 is selected, and the estimated value of the coefficient of the indicator is −1.1919, which implies the probability of falling into financial distress for a company will decrease with the company's ROA value, i.e., net profit/average total assets increasing.
Besides, the results in Table 3 also show that all changes in the indicator of sales revenue/average total assets for three consecutive time periods have significant effects on the future financial distress risk. It can be found that different weights are assigned to the variables of sales revenue/average total assets with different time lags, and the coefficient estimates for the indicator in the time windows of t − 3, t − 4 and t − 5 are −0.4367, −5.7393 and −1.8312, respectively. This implies that increases in sale revenue in different time windows have positive and significant (but different) effects on the future financial status of a listed company. The result for the indicator of "net cash flow from operating activities/total assets" presented in row 13 and the first 3 columns of Table 3 illustrate that changes in this indicator in different time windows have different effects on the future occurrence of financial distress at a significance level and magnitude of influence. The estimated coefficients for the variable measured in the previous time windows, t − 3, t − 4 and t − 5, are −4.8561, −2.6798 and −1.0999, at the significance level of 0.01, 0.05 and (>) 0.1, respectively. This indicates that (1) the higher the ratio of net cash flow from operating activities to total assets for a listed manufacturing company, the lower the likelihood of the firm's financial distress; (2) the changes in net cash flow from operating activities/total assets in the time windows t − 3 and t − 4 have significant influence on the risk of financial distress, and the magnitude of influence increases as the length of lag time decreases; (3) the influence of this indicator declines over time and change in this indicator in the 5 years before the observation of the financial distress event has no significant effect on financial risk when compared with relatively recent changes.

The Results and Analyses of the Model of Accounting Plus Macroeconomic Variables
In Table 3, Model 2 represents the "accounting plus macroeconomic factor" model, including the original 43 financial ratios and 4 macroeconomic indicators in 3 adjacent years, and the results of indicator selection and the coefficient estimates are listed in the last three columns. It can be found that for Model 2, the same group of financial variables is selected and included in the final model. Time lags of the selected financial variables and the signs (but not magnitudes) of the estimated coefficients for the variables are almost consistent for Model 1 and 2.
In addition to the accounting ratios, three macroeconomic factors are selected as significant predictors and included in the final model: GDP growth, consumption level growth and unemployment rate in time window of t − 3. The estimate for the coefficients of the selected GDP growth and unemployment rate are −2.4867 and 2.7262, respectively, which means that high GDP growth should decrease the financial distress risk, but high unemployment will deteriorate the financial condition of a listed manufacturing company. These results are consistent, which was expected. The estimate for the coefficient of consumption level growth is −0.9931, which implies that the high consumption level growth should decrease the possibility of financial deterioration of a listed company. Finally, it cannot be found that Consumer Price Index (CPI) growth has a significant influence on the financial distress risk.
The 4 year-lagged and 5 year-lagged GDP growth and 4 year-lagged consumption level growth are also selected and included in the final model but not as very significant predictors, which implies the following: (1) the changes in macroeconomic conditions have a continuous influence on the financial distress risk; (2) however, the effect of the macroeconomic condition' changes on the financial distress risk declines with the length of the lag window increasing.

The Results of Lasso-SVM-Distributed Lag (LSVMDL) Models and Analyses
We introduce 3-period lags of financial indicators presented in Table 1, i.e., TL/TA t−3 , TL/TA t−4 and TL/TA t−5 , CA/CL t−3 , CA/CL t−4 and CA/CL t−5 . . . , NICCE/NOS t−3 , NICCE/NOS t−4 and NICCE/NOS t−5 into the model (16) and implement the indicator selection and the coefficient estimates by using Algorithm 3. The corresponding results are presented in first three columns of Table 4. Then, we introduce 3-period lags of financial and macroeconomic indicators presented in Tables 1 and 2 into the model (16) and the coefficient estimate of selected indicators are presented in the last three columns of Table 4.
Twenty-four financial indicators are selected and included in the final SVM-distributed lag model, denoted as Model 1 in Table 4; 17 indicators among them are also included in the final logistic-distributed lag model. For convenience of comparison, the 17 indicators, such as total liabilities/total assets, current liabilities/total assets and sales revenue/average current assets etc., are italicized and shown in the "selected indicator" column of Table 4.
According to the relation between response variables and predictors in the SVM model, as mentioned before, the increase (decrease) in the factors should increase (decrease) the financial distress risk when the coefficient estimates are positive. Therefore, let us take the estimated results in the first three rows and columns as an example: (1) the increase in the total liabilities to total assets ratio should increase the financial distress risk of a listed manufacturing company; (2) the increase in current liabilities to total assets ratio should decrease the financial distress risk; (3) the changes in the indicators in the period closer to the time of obtaining ST have a more significant effect on the likelihood of financial distress in terms of magnitudes of estimates of the coefficients.
Four macroeconomic factors, in addition to 24 financial indicators, are selected and included in the final SVM-distributed lag model, denoted as Model 2 in Table 4. The results show that (1) the effects of the selected financial ratios on the response, i.e., the financial status of a company, is consistent with the results in the SVM-distributed lag model including only financial ratios, i.e., Model 1, in terms of time lags of the selected financial variables and the signs of the estimated coefficients for the explanatory variables; (2) high GDP growth and high consumption level growth should decrease the financial distress risk, but high unemployment will deteriorate the financial condition of a listed manufacturing company.
From Table 4, it can be found that different indicators have different influence on the financial status of a company. The effects of some indicators on financial distress risk increase with the decrease in the time lag, e.g., total liabilities to total assets ratio, current liabilities/total assets and net cash flow from operating and investing activities/total liabilities etc., while the effects of some other indicators should decrease with the decrease in the time lag, e.g., fixed assets/total assets, GDP growth and consumption level growth etc. However, for some indicators, the effects of different time windows on financial status change. For example, the coefficients for current assets/current liabilities (current ratio) in Model 1 are 13.7838 for time window t − 4 and −23.2184 for time window t − 5, which implies that a high current ratio in time window t − 5 should decrease the financial distress risk; this, however, would be not the case in time t − 4. Similar case can be found for CPI growth in Model 2. Thus, SVM-distributed lag models may not interpret well; therefore, it would be inferior to the logistic-distributed lag models in terms of in terms of interpretability.

Comparison with Other Models
For the purpose of comparison, the prediction performances of the ex-ante models for the estimation of financial distress likelihood developed by the existing studies are shown in Tables 5 and 6. The existing widely used ex-ante models include the neural network (NN), decision tree (DT), SVM, and logistic models estimated in different time periods of t − 3, t − 4, and t − 5, called t − 3 models, t − 4 models and t − 5 models. The construction of these three groups of models is similar to [7]. Let us take the construction of t − 5 model as example. For 10 financially distressed companies that received ST in 2012 and the selected 10 healthy companies until 2012 as a control group, their financial and macroeconomic data in 2007 (5 years before 2012) were collected. For 9 financially distressed companies that received ST in 2013 and the selected 9 healthy companies, their financial and macroeconomic data in 2008 (5 years before 2013) were collected. Similarly, for 17, 24, 26 financial distressed companies that receive the ST label respectively in 2014, 2015 and 2016 and the non-financial companies randomly selected at a 1:1 ratio in each year for matching with the ST companies, their data in 2009 (5 years before 2014), 2010 (5 years before 2015) and in 2011 (5 years before 2016) were collected. By using the labels of 172 companies and the data that were obtained 5 years prior to the year when the companies received the ST label, we construct t − 5 financial distress forecast models combined with a neural network (NN), decision tree (DT), SVM, and logistic regression. Similarly, t − 3 models and t − 4 models can be built. The data of financially distressed companies that received ST in 2017 and non-financial distressed companies were used to evaluate these models' predicting performance.
As mentioned in the beginning of this section, three measures of prediction performances are reported in these two tables, namely, AUC, G-mean, and Kolmogorov-Smirnov statistics. In the above scenarios based on different time periods as well as division of the whole dataset, we compare respectively the predicting performance of those one-time window models (t − 3 models, t − 4 models and t − 5 models) including financial ratios only and financial ratios plus macroeconomic factors with our lasso-SVM-distributed lag (LSVMDL) model and lasso-logistic-distributed lag (LLDL). The prediction results are presented in Table 5 for the case of "financial ratios only" and Table 6 for the case of "financial ratio plus macroeconomic factors".
In Table 5, panel A presents the predictive performances of NN, DT, lasso-SVM and lasso-logistic models including the original 43 financial ratios shown in Table 1 in the period t − 3 as predictors of financial distress status in period t, while the results in the last two columns are the performances of the two groups of distributed lag financial distress predicting models including the same original 43 financial ratios but in periods t − 3, t − 4 and t − 5, i.e., our models. Panel B and C of Table 5 present the prediction performance of the models used for comparison purposes estimated in t − 4 and t − 5, respectively. The results for our models retain the same values because these models include simultaneously the 3-year-, 4-year-and 5-year-lagged financial ratios.
The only difference between Tables 5 and 6 is that all models, in addition to the 43 original accounting rations, incorporate 4 macroeconomic indicators in different time windows. For example, for time window t − 3, the NN, DT, lasso-SVM and lasso-logistic models include 3-year-lagged macroeconomic indicators shown in Table 2 in addition to the financial statement ratios shown in Table 1. The cases of time windows t − 4 and t − 5 are similar for these models. As for the LSVMDL and LLDL models, i.e., our models, they include 3-periods-lagged macroeconomic indicators in the time windows t − 3, t − 4 and t − 5 in addition to the accounting ratios.
From Table 5, the prediction accuracy of NN or DT is highest in the time windows t − 3 and t − 4; our models outperform the others in time window t − 5 for predicting accuracy. Generally speaking, the accuracy for time period t − 3 is relatively higher than the other two time periods for the NN, lasso-SVM and lasso-logistic models. Furthermore, the prediction results based on time period t − 3 are the most precise for NN when compared with other models in a single time period and even our models, which implies that the selected financial ratios in the period closer to the time of obtaining ST may contain more useful information for the prediction of financial distress, and may be applicable to NN. The AUC of 91.52% of the lasso-logistic-distributed lag model (LLDL) ranked second, close to the accuracy of 93.56% obtained by using NN. Therefore, the LLDL model should be competitive in terms of interpretability and accuracy in the case of "accounting ratio only".
From Table 6, the prediction accuracy of all used models is higher than the results in Table 5. For example, the AUC, G-mean and KS of the NN model in time window t − 3 increases from 93.56%, 86.73% and 88.00% in Table 5 to 94.00%, 90.87% and 89.00% in Table 6, respectively. The changing tendency of the prediction accuracy is retained for the other models, including macroeconomic indicators in addition to the accounting ratios. All results in Table 6 indicate that the introduction of the macroeconomic variables can improve predictive performance of all used models for the purpose of comparison; the changes in macroeconomic conditions do affect the likelihood of financial distress risk. On the other hand, the LLDL model performs best with the AUC of over 95% when compared with the best NN (in time period t − 3, 94%), the best DT (in time period t − 4, 92.24%), the best lasso-SVM (in time period t − 4, 93.64%), the best lasso-logistic (in time period t − 5, 90.68%) and LSVMDL (93.12%). The LSVMDL model is the best performing model in terms of G-mean and KS statistics. Figure 2 also shows the comparative results of the accuracy of the six models. The predictive performances of all the models including accounting ratios only, indicated by the dotted lines (a), (c) and (e) in Figure 2, are worse than the models including macroeconomic indicators as well as accounting ratios, which are illustrated by the solid lines (b), (d) and (f) in Figure 2. Figures (a) and (b), G-Mean for (c) and (d), and KS for (e) and (f) present AUC, G-mean and KS for all of the examined models, respectively. The models used for comparison, namely, NN, DT, lasso-SVM and lasso-logistic models, were those that yielded the highest accuracy based on the different time window dataset. For example, based on the results of panel (b), AUC of NN (the yellow solid line), DT (the pink solid line), and lasso-logistic (the red one) models are highest in time window t − 3, t − 4 and t − 5, respectively. We cannot conclude that the prediction results based on financial and macroeconomic data of one specific time window, e.g., t − 3 (see [7]), are the most accurate. However, from the results in (b), (d) and (f), our models, the LLDL or LSVMDL model incorporating financial and macroeconomic data in three consecutive time-windows, yielded relatively robust and higher prediction performances.
Put simply, the two groups of generalized distributed lag financial distress predicting models proposed by this paper outperform the other models in each time period, especially when the accounting ratios and macroeconomic factors were introduced into the models. We demonstrated that our models provide an effective way to deal with multiple time period information obtained from changes in accounting and macroeconomic conditions.

Discussion
Logistic regression and multivariate discriminant methods should be the most popular statistical techniques used in financial distress risk prediction modelling for different countries' enterprise, e.g., American enterprises [1] and European enterprises [4,30,31], because of their simplicity, good predictive performance and interpretability. The main statistical approach involved in this study is logistic regression, but rather multivariate discriminant analysis, given that strict assumptions regarding normal distribution of explanatory variables are used in multivariate discriminant analysis. The results in this study conform that logistic regression models still perform well for predicting Chinese listed enterprises' financial distress risks.
The major contribution to financial distress prediction literature made by this paper is that an optimally distributed lag structure of macroeconomic data in the multi-periods, in addition to financial ratio data, are imposed on the logistic regression model through minimizing loss function, and the heterogenous lagged effects of the factors in the different period are presented. The results unveil that financial indicators, such as total liabilities/total assets, sales revenue/total assets, and net cash flow from operating activities/total assets, tend to have a significant impact over relatively longer periods, e.g., 5 years before the financial crisis of a Chinese listed manufacturing company. This finding is in accordance with the recent research of [30,31] in that the authors claim the process of going bankrupt is not a sudden phenomenon; it may take as long as 5-6 years. In the very recent study of Korol et al. [30], the authors built 10 group models comprising 10 periods: from 1 year to 10 years prior to bankruptcy. The results in [30] indicate that a bankruptcy prediction model such as the fuzzy set model maintained an effectiveness level above 70% until the eighth year prior to bankruptcy. Therefore, our model can be extended through introducing more lagged explanatory variables, e.g., 6-to 8-year-lagged financial variables, which may bring a better distributive lag structure of explanatory variables and predicting ability of the models.
The findings of this study allow managers and corporate analysts to prevent financial crisis of a company by monitoring early changes in a few sensitive financial indicators and taking actions, such as optimizing the corporate's asset structure, increasing cash flow and sales revenue, etc. They are also helpful for investors to make investment decision by tracking continuous changes in accounting conditions of a company of interest and predicting its risk of financial distress.
Another major contribution of this study is the confirmation of the importance of macroeconomic variables in predicting the financial distress of a Chinese manufacturing company, although scholars still argue about the significance of macro variables. For example, Kacer et al. [66] did not recommend the use of macro variables in the financial distress prediction for Slovak Enterprises, while Hernandez Tinoco et al. [4] confirmed the utilization of macro variables in the financial distress prediction for listed enterprises of the United Kingdom. The results in Section 5.2.4 of this study show that the prediction performance of all models (including both the models used for comparison and our own models) was increased when the macro variables were included in each model. The findings of this study allow regulators to tighten the supervision of Chinese listed companies when macroeconomic conditions change, especially in an economic downturn.
One of the main limitations of this study is that we limited the research only to the listed manufacturing companies. Both Korol et al. [28] and Kovacova [30] emphasized that the type of industry affects the risk of deterioration in the financial situation of companies. More specifically, distinguished by factors such as intensity of competition, life cycle of products, demand, changes in consumer preferences, technological change, reducing entry barriers into the industry and susceptibility of the industry to business cycles, different industries are at different levels of risk [28]. The manufacturing sector, which includes the metal, mining, automotive, aerospace and housing industries, is highly susceptible to demands, technological changes and macroeconomic conditions, thus making it at a high level of risk, while agriculture may be at a relatively low risk level. The risk parameter assigned to the service sector, including restaurants, tourism, transport and entertainment etc., has seen significant changes following the outbreak of the Coronavirus. Therefore, applicability and critique to our models for predicting financial distress risk of the companies operating in other industry and even other countries need to be further detected.

Conclusions
In this paper, we propose a new framework of a financial early warning system through introducing a distributed lag structure to be widely used in financial distress prediction models such as the logistic regression and SVM models. Our models are competitive when compared with the conventional financial distress forecast models, which incorporates data from only one-period of t − 3 or t − 4 or t − 5, in terms of predictive performance. Furthermore, our models are superior to the conventional one-time window financial distress forecast models, in which macroeconomic indicators of GDP growth, consumption level growth and unemployment rate, in addition to accounting factors, are incorporated. The empirical findings of this study indicate that the changes in macroeconomic conditions do have significant and continuous influence on the financial distress risk of a listed manufacturing company. This paper may provide an approach of examining the impacts of macroeconomic information from multiple periods and improving the predictive performance of financial distress models.
We implement feature selection to remove redundant factors from the original list of 43 potentially helpful ratios and their lags by introducing lasso penalty into the financial distress forecast logistic models with lags and SVM models with lags. Furthermore, we provide an ADMM algorithm framework that yields the global optimum for convex and the non-smooth optimization problem to obtain the optimal estimation for the coefficients of these financial distress forecast models with financial and macroeconomic factors and their lags. Results from the empirical study show that not only widely used financial indicators (calculated from accounting data), such as leverage ratio, ROE, ROA, and current liabilities/total liabilities, have significant influence on the financial distress risk of a listed manufacturing company, but also the indicators that are rarely seen in the existing literature, such as net profit attributable to shareholders of the parent company and net cash flow from operating activities/total assets, may play very important roles in financial distress prediction. The closer to the time of financial crisis, the more net profit attributable to shareholders of the parent company and net cash flow from operating activities may considerably decrease the financial distress risk. These research findings may provide more evidence for company managers and investors in terms of corporate governance or risk control.
The main limitation of this research is that we limited the research only to listed manufacturing companies. Sensitivity of financial distress models and suitability of both financial and macroeconomic variables to the enterprises that operate in other industries, e.g., service companies, need to be further discussed. On the other hand, given that the utilization of financial and macroeconomic variables in predicting the risk of financial distress of Chinese listed manufacturing companies is confirmed, we intend to continue the research toward the use of interaction terms of financial and macroeconomic variables in the context of the multiple period. Furthermore, the heterogeneous effect of changes in macroeconomic conditions on the financial distress risk of a company under different financial conditions can be discovered.