Abstract
Financial risk early warning systems provide critical corporate financial status information to stakeholders, including corporate managers, investors, regulatory agencies, and other interested parties, enabling informed decision-making. This study proposes a corporate financial risk early warning model based on a bagging–cascading–boosting architecture, which can be used to predict the financial risk of a firm. The model performance is improved by integrating the residual fitting characteristics of LightGBM, the variance suppression mechanism of bagging, and the adaptive expansion ability of the cascade framework. Evaluated on 46 financial indicators from 2826 A-share-listed companies, the model demonstrates superior performance in AUC and F1-score metrics, outperforming traditional statistical methods and standalone machine-learning models. The methodological innovation lies in its tripartite mechanism: LightGBM ensures low-bias prediction, bagging controls variance, and the cascading structure dynamically adapts to data complexity, maintaining 94.09% AUC robustness, even when training data is reduced to 50%. Empirical results confirm this “ensemble-of-ensembles” framework effectively identifies Special Treatment (ST) firms, delivering early risk alerts for management while supporting investment decisions and regulatory risk mitigation.
1. Introduction
Against the background of the current domestic and international economic fluctuations, numerous enterprises face financial distress or even imminent bankruptcy risks due to pressures from global supply chain restructuring, divergent monetary policies, and internal debt structure imbalances. Such risks not only jeopardize corporate interests but also inflict direct or indirect losses on all stakeholders []. Consequently, financial risk early warning has emerged as a critical focus. Corporate financial conditions are now drawing heightened attention from both management and diverse stakeholders, including employees, government agencies, creditors, and investors. These stakeholders monitor financial health to mitigate systemic repercussions.
The development of high-performance early warning models is imperative, given the critical impact of corporate financial risk alerts on financial institutions’ risk management efficacy and profitability. At present, the financial risk early warning models mainly include statistical methods and machine-learning-based artificial intelligence techniques. Statistical methods include linear discriminant analysis (LDA) [], logistic regression (LR) [], and factor analysis []. Although widely adopted, statistical methods suffer from restrictive assumptions—including multivariate normality, linearity, and independence of predictors—that limit their validity and applicability []. With the development of information processing technology, single machine-learning methods such as a neural network (NN) [], a support vector machine (SVM) [], and a decision tree (DT) [] are gradually being applied to financial risk early warning; this has taken the research of financial risk management to a new level. However, exponential growth in dataset dimensionality and variable complexity within financial risk data renders standalone machine-learning methods inadequate under the “No Free Lunch” theorem []. A more effective scheme is to construct an ensemble learning framework to improve the model performance by fusing multiple base classifiers. According to the difference of ensemble strategies, the ensemble algorithms in the field of enterprise financial risk early warning are mainly divided into three categories: bagging-type methods reduce the prediction variance by training multiple base learners in parallel, thereby mitigating overfitting through bootstrap aggregation and model averaging; boosting-type methods reduce the prediction deviation by iterative optimization; the stacking method achieves variance-deviation equilibrium by hierarchical transformation of feature space. The symmetry structure of financial data introduces additional considerations for ensemble design. For instance, bagging-type methods may benefit from symmetric sampling to preserve invariant financial patterns, while boosting algorithms could leverage symmetry-aware loss functions to better capture cyclical market behaviors. Stacking’s hierarchical transformations can be further enhanced by incorporating symmetry-preserving feature mappings, ensuring that the fused representations maintain essential geometric properties of the financial state space. These symmetry-aware adaptations provide a principled way to address the No Free Lunch challenge in financial risk prediction. All three methods can effectively improve the performance of enterprise risk early warning.
Capitalizing on the complementary advantages of bagging ensembles in variance optimization and boosting ensembles in bias reduction, this study leverages the cascading framework’s intrinsic capacity for adaptive extensibility in corporate financial risk modeling. The proposed architecture combines these strengths to create a bagging–cascading-boosted tree solution specifically designed for financial risk early warning. The proposed framework employs a tripartite optimization mechanism to enhance model performance. Light Gradient Boosting Machine (LightGBM) serves as the base learner, leveraging its sequential residual fitting properties to achieve low-bias predictions. A bagging ensemble is then applied to bootstrap-aggregate multiple LightGBM outputs, which can effectively suppress the prediction variance. Ultimately, a depth-adaptive cascading structure enables architectural expansion, where model complexity dynamically adjusts to match the dimensional characteristics of financial risk datasets. This design synergistically integrates bagging’s variance control, boosting’s bias optimization, and cascading’s structural scalability, thereby establishing a dynamic learning paradigm tailored for corporate financial risk analytics. The innovations of this study are primarily manifested in the following aspects:
- (1)
- Existing ensemble methods typically rely on either bagging (variance reduction) or boosting (bias reduction) in isolation, limiting their ability to address the complex trade-offs in financial risk modeling. This study pioneers a bagging–cascading–boosting tripartite architecture, where LightGBM’s sequential residual fitting minimizes bias, bagging aggregates multiple outputs to suppress variance, and the cascading framework dynamically adjusts model depth. This integration outperforms traditional single-strategy ensembles by simultaneously optimizing prediction performance and stability.
- (2)
- Existing ensemble frameworks often suffer from rigid structures that cannot adapt to evolving data complexity. The proposed cascading strategy overcomes this limitation by dynamically adjusting model depth based on the complexity of datasets. Unlike static architectures, this design enables seamless incorporation of new risk indicators, ensuring scalability without compromising performance.
- (3)
- While prior studies have explored bagging or boosting in isolation, this work pioneers their synergistic integration. The LightGBM–bagging combination not only enhances prediction stability but also mitigates overfitting risks, a common issue in financial risk modeling. The cascading layer further refines this robustness by allowing iterative feature selection. This triple innovation—hybrid architecture, adaptive scaling, and robust optimization—sets this work apart from existing ensemble approaches.
2. Literature Review
Financial risk early warning research is a key focus in modern corporate risk management. These studies rely on scientifically validated predictive models. Financial risk early warning research is crucial for modern businesses. It helps companies spot potential financial problems early. By using advanced predictive models, executives can track their company’s health in real time. These models also guide investors in making smart choices and reducing risks.
2.1. Statistical Methods and Individual ML-Based Algorithms
Early warning models constitute the backbone of financial risk management by enabling early detection of potential financial distress. To date, scholars have extensively researched financial early warning models, which broadly fall into statistical and non-statistical categories. Initial models predominantly employed statistical approaches, exemplified by LDA and LR. Beaver [] pioneered the use of univariate models in financial distress prediction; subsequently, Altman [] developed the classic multivariate Z-score model using five financial ratios. Martin [] first applied logistic regression to bank failure forecasting, while Ohlson [] formalized a logit-based financial warning model. Jones and Hensher [] proposed a mixed logit model to overcome the limitations of standard logit, demonstrating superior performance in financial distress prediction.
As corporate operating environments grow increasingly complex, information relevant to financial risk early warning expands exponentially, while data structures become increasingly intricate. Traditional statistical methods, however, require high-quality data, complex underlying assumptions, and strict adherence to data distributions—requirements that constrain their development in financial risk prediction. In contrast, machine-learning algorithms typically operate without such restrictive assumptions and enable continuous model refinement through adaptive recalibration to new data. Commonly employed machine-learning algorithms within corporate financial risk forecasting include NN, DT, KNN, and SVM. Odom and Sharda [] first utilized artificial neural networks to construct an early warning model, demonstrating its superior performance compared to traditional models based on methods like logistic regression. Wu et al. [] combined a multi-layer perceptron artificial neural network (MLP-ANN) with the traditional Altman Z-Score model for stock market forecasting, achieving 99.4% classification accuracy in predicting financial distress. Chen et al. [] introduced a sparse neural-network-based model (FDP-SNN) that integrates financial and non-financial predictors, demonstrating superior accuracy and interpretability in financial distress prediction. Mu-Yen Chen [] compares decision tree (C5.0, CART, CHAID) and logistic regression models integrated with PCA for financial distress prediction, demonstrating that decision trees achieve superior short-term accuracy. Xie et al. [] constructed support vector machine (SVM) and multivariate discriminant analysis (MDA) models using Chinese-listed companies as research samples, with empirical results demonstrating SVM’s superiority over MDA. This performance advantage was further substantiated by Xu and Xiao et al. [], who found SVM exhibits enhanced accuracy and generalization capabilities compared to logistic regression.
2.2. Ensemble Approaches
As research on corporate financial distress early warning continues to evolve, researchers have observed that individual warning models possess distinctive strengths and limitations. To enhance predictive accuracy, scholars have consequently turned to ensemble learning approaches, which fundamentally operate through three frameworks: boosting, bagging, and stacking. Boosting, a machine-learning meta-algorithm designed to reduce bias, systematically transforms weak learners—classifiers performing marginally better than random guessing—into strong learners capable of achieving high predictive accuracy. As established in foundational literature [], these complementary learner types constitute the core mechanism of boosting frameworks. Prominent implementations include AdaBoost [], XGBoost [], and LightGBM [], each demonstrating this transformative capability through iterative weak classifier optimization. AdaBoost, or Adaptive Boosting, is a machine-learning ensemble technique that combines multiple weak classifiers to create a strong classifier. Heo et al. [] demonstrated that AdaBoost outperformed traditional models in predicting bankruptcy among Korean construction companies, particularly for larger firms with significant capital. The study highlighted the model’s ability to adapt to the unique financial structures of different industries, thereby enhancing predictive accuracy. Yao et al. [] proposed an AdaBoost ensemble model with fast nondominated feature selection (AdaFNDFS). The findings demonstrate that the predictive performance of AdaFNDFS surpasses that of other comparative models. XGBoost, or Extreme Gradient Boosting, is another powerful machine-learning algorithm that has been widely adopted for financial risk prediction. Zhang et al. [] utilized XGBoost to predict bond defaults in China’s emerging market, demonstrating its superiority over traditional algorithms in managing imbalanced data. Wang et al. [] developed an adaptive weighted XGBoost-bagging model to predict financial distress in Chinese enterprises. Their research highlighted the model’s effectiveness in addressing data imbalance and information scarcity, significantly enhancing prediction accuracy during periods of economic turmoil. Tan et al. [] leveraged XGBoost to analyze 20 financial indicators, achieving superior risk prediction accuracy and identifying eight key risk factors (real economy, institutions, etc.) with actionable early warning thresholds for policymakers. The application of LightGBM in enterprise financial risk warning has garnered considerable attention in recent research, demonstrating its effectiveness in handling high-dimensional financial data and improving prediction accuracy. Huang et al. [] found that the LightGBM model outperforms other GBDT-based techniques in efficiency and prediction accuracy. The bagging algorithm constructs multiple base learners through bootstrap sampling of training subsets, then integrates their predictions via voting (for classification) or averaging (for regression) to produce final outputs. Representative implementations include Random Forests (RF) and other ensemble methods. Notably, Zhu et al. [] developed a comprehensive corporate financial risk warning system using a DS-RF model, which demonstrates enhanced capability in revealing financial risk characteristics and underlying drivers. Stacking (stacked generalization) is a hierarchical ensemble learning method that enhances overall predictive performance by consolidating the predictions of multiple heterogeneous base models as new input features for a meta-model. Deron Liang et al. [] developed a stacking ensemble model for bankruptcy prediction using six optimized financial ratios and six corporate governance indicators, demonstrating superior performance in high-cost misclassification scenarios. Chen et al. [] proposed a stacking ensemble model (DNN + MNLogit + MDA) with stock data integration for multi-class financial distress prediction, achieving an 88.7% F1-score in classifying Chinese firms’ financial health states. Wang and Chi [] proposed a cost-sensitive stacking (CSStacking) model with two-phase feature selection, significantly outperforming benchmarks in predicting Chinese-listed companies’ financial distress 5 years ahead.
2.3. Other Related Research
Unbalanced learning is a significant challenge in financial risk prediction, as the distribution of financial distress cases is often skewed. Traditional models may struggle to accurately predict minority classes, leading to high false-negative rates. To address this issue, researchers have proposed various strategies, including those at the data level and algorithm level. Minh Nguyen, Bang Nguyen, and Minh-Lý Liêu [] applied SMOTE to address data scarcity in Vietnam’s transition economy, testing seven models where neural networks with combined Altman-Ohlson variables perform best. Aykut Ekinci and Safa Sen [] highlighted a cost-sensitive-learning approach as the optimal imbalance treatment, effectively minimizing Type-II errors in bank failure prediction compared to conventional sampling methods.
Interpretability can be defined as the degree to which a model’s predictions can be clearly and intuitively understood and explained. Interpretability is particularly crucial in the field of enterprise financial risk management, as decision-makers need to comprehend the rationale behind a model’s outputs to trust its results. Currently, Local Interpretable Model-agnostic Explanations (LIME) and SHapley Additive exPlanations (SHAP) represent two predominant approaches for interpreting model predictions. Min-Jae Lee and Sun-Yong Choi [] employed SHAP analysis on GBR/LightGBM models to quantify 28 financial indicators’ contributions, revealing net interest income as the dominant factor in bank credit ratings. Jianfeng Zhang and Zexin Zhao [] adopted the XGBoost-SHAP framework to predict ESG ratings (91% accuracy) with interpretable feature analysis, revealing that financial metrics dominate over non-financial factors in Chinese A-shares (2013–2022).
Additionally, the application of time-series ensembles in corporate financial risk warning has demonstrated remarkable effectiveness by leveraging the collective strengths of multiple predictive models to enhance accuracy and robustness. Kaijian He et al. [] (2023) proposed a CNN-LSTM-ARMA ensemble model to capture spatiotemporal and autocorrelation features in financial time series, outperforming standalone models in accuracy and robustness through hybrid linear–nonlinear feature integration. Lesia Mochurad and Andrii Dereviannyi [] integrated LSTM and ARIMA models to enhance financial market forecasting accuracy. The hybrid approach demonstrates a 15% RMSE improvement over standalone LSTM models, validated across three real-world datasets. Parallelization potential further boosts efficiency for future research.
3. Methodology
The proposed algorithm fundamentally integrates multiple learners to identify and screen financial distress enterprises, comprising three key phases: generating robust and diverse individual learners, designing ensemble strategies to integrate these base learners, and fusing their predictions. The subsequent sections demonstrate critical technical implementations in detail.
3.1. LightGBM as Base Learner
Gradient Boosting Decision Trees (GBDT) constitute a prominent boosting ensemble methodology that significantly enhances the predictive performance of early warning models by strategically adjusting sample weighting priorities during iterative training cycles. This approach fundamentally iteratively minimizes errors by directing each subsequent base learner to focus more intensively on samples misclassified in preceding iterations. This contrasts with AdaBoost’s strategy of increasing misclassified sample weights, as GBDT instead leverages negative gradients as a proxy for prior base learner errors and corrects these deviations in subsequent iterations by fitting negative gradients, thereby implementing gradient descent in function space.
Based on the ensemble structure of gradient boosting, LightGBM is also an additive model, which can be defined as
where is the weight coefficient of the base learner, represents the t-th base learner. In GBDT, represents a DT, is the structural parameter of the DT. Conventional full-parameter optimization of additive ensemble models typically faces computational challenges due to high complexity. To address this, Friedman proposed the Forward Stagewise Algorithm as an iterative approximation solution. The core idea of this algorithm is to incrementally optimize local parameters to approach the global optimum, with its -th iteration formally expressed as
The objective of machine learning-based corporate financial risk early warning systems is to identify the likelihood of companies being designated as Special Treatment (ST). Consequently, the optimization goal for the -th decision tree aims to further reduce the value of the loss function L from the ensemble model formed by the previous iterations. Based on this analysis, Equation (2) can be reformulated as
where denotes the total number of samples in the enterprise financial risk early-waring training set, and represents the feature of the -th sample.
LightGBM inherits the training paradigm from XGBoost, which approximates the loss function using first-order and second-order Taylor expansions while incorporating regularization terms to control the complexity of each tree
where is the first derivative of the loss function, represents the second derivative of the loss function. The regularization term controls the complexity of each decision tree (DT), where the tree’s complexity is determined by both the number of leaf nodes and the values assigned to these nodes. Consequently, the regularization term can be explicitly formulated as
where is the number of leaf nodes, is the number of leaf nodes and is the value at the -th leaf node. The parameters and respectively govern the weights assigned to the number of leaf nodes and the values within these nodes. This regularization term constrains tree complexity by effectively reducing the number of nodes while simultaneously limiting leaf scores—since higher prediction scores accelerate fitting and increase overfitting risk.
To minimize the objective function, we set the first-order derivative of the loss function to zero, thereby determining the optimal prediction scores for each leaf node as
here, where , and q(xi) is a function that maps -th instance into a leaf node. Because the object of the -th DT is to further reduce the loss of iterations. Therefore, the objective function at -th boosting can be simplified as
Compared to gradient boosting, LightGBM, an enhanced Gradient Boosting Decision Tree (GBDT) algorithm proposed by Ke et al. [], addresses the limitations of GBDT in scalability and effectiveness when handling high-dimensional features or large-scale datasets. Building upon GBDT’s framework, LightGBM integrates multifaceted optimizations to achieve accelerated training efficiency, reduced memory consumption, improved accuracy, parallel learning capabilities, and enhanced large-data processing capacity. Key innovations include the histogram algorithm for computational acceleration; depth-constrained leaf-wise growth strategy for higher precision under equivalent splits; Gradient-based One-Side Sampling (GOSS), balancing data reduction with decision tree accuracy; Exclusive Feature Bundling (EFB), reducing feature dimensionality; histogram differencing for operational speed gains. Consequently, LightGBM demonstrates exceptional advantages in processing voluminous financial fraud data from listed companies.
3.2. Base Learner Training Based on the Cascade Framework
Bagging is a parallelized ensemble algorithm characterized by the absence of dependency relationships between weak learners, allowing for parallel fitting. For example, in a random sampling of a training set containing samples, the probability of being collected each time is , and the probability of not being collected is . The probability that is not collected in each -th resampling is . According to the central limit theorem, when , . That is to say, in each round of random sampling in bagging, approximately 36.8% of the data in the training set is not included in the sampling set. For approximately 36.8% of the data that has not been sampled, we often refer to it as Out of Bag (OOB) data. This data did not participate in the fitting of the training set model, so it can be used to test the model’s generalization ability.
In this study, we propose a new bagging algorithm for training and integrating base learners to enhance the generalization ability of ensemble models in financial risk warning tasks. Unlike traditional bagging algorithms that perform sample-level bootstrap resampling, our approach adopts a level-set-based framework. Given the financial risk warning training set and the test set , where , respectively represent the features and labels of the -th sample. If we set the number of random samples to , we can obtain a training subset of samples . In each bagging iteration, according to the previous description, 36.8% of the out-of-bag data will not participate in the generation of the training subset, which can be expressed as . Among them, is a subset formed by the out-of-package data in the -th bagging loop, which can be represented as , Among them, represents the number of samples in the data outside the package.
In each iteration of the bagging algorithm, each base learner is trained on a sampled training subset. Considering the impact of the robustness of the base learners on the performance of the ensemble algorithm, in this study, we use four LightGBM models as base learners. Due to LightGBM being a highly integrated algorithm that relies on fine-tuning, we use the fine-tuned LightGBM model as the base learner for our proposed model in a mechanism where LightGBM serves as the base model. Under this mechanism, LightGBM, as a mechanism that minimizes losses through iteration, ensures low bias. Secondly, the bagging style training mechanism makes LightGBM’s inputs more diverse, further reducing the model’s prediction variance and effectively balancing the trade-off between variance and bias in machine-learning prediction models. After T rounds of repeated training based on different sample sets, each base learner can generate confidence information for a financial risk warning sample based on the determined structure and parameters of the training stage. This confidence information details the probability of which category the enterprise’s financial status belongs to, and this probability can be further transformed into the discrimination of the enterprise’s financial status based on the selection of a threshold. We will specifically represent the probability information of samples outside the package and samples in the test set as follows:
here, is the prediction confidence information of the ensemble model on the OOB, , , represents the OOB data, and represents test data.
3.3. Aggregate Multiple Base Learners for Financial Risk Early Warning
Cascaded ensemble methodology constitutes a specialized stacking mechanism that progressively strengthens weak risk early warning models through sequential training of base learners across tiers, fundamentally comprising three integrally linked phases: initial partitioning of the original dataset into mutually exclusive subsets via our implemented bagging algorithm; subsequent parallelized training and validation of a quartet of LightGBM models on the segmented data; ultimately, the propagation of Tier-1 confidence scores as transformed inputs to train Tier-2 base learners, thereby enabling hierarchical feature refinement and predictive enhancement.
In this study, we employ bagging to partition the original training set, where each derived subset (containing 63.2% of original samples) trains successive cascade tiers, while the remaining 37.8% out-of-bag data constitutes a validation set that not only evaluates tier performance but also informs subsequent cascade construction. This partitioning yields disjoint datasets: the sampled subset trains Tier-1 base learners, whose predictions on the full training set transform into confidence scores via Equation (8) to serve as Tier-2 inputs; concurrently, the validation set assesses model efficacy and benchmarks against subsequent tiers. In this study, we use Equation (9) to obtain the probabilistic prediction of each level of the cascading framework, and the ST/Non-ST status is further identified by a threshold function, which can be expressed as the following Algorithm 1.
| Algorithm 1: Training of the proposed method |
| Input: A training set for financial risk early warning; ; ; Round T for Bagging; Output: the proposed method
|
Validation performance at each cascade tier is quantified by the mean accuracy across four base learners, where label ‘1’ denotes enterprises with financial risk and ‘0’ indicates financially healthy entities. The threshold governs the discretization of predicted probabilities and ultimately determines three critical evaluation metrics; given the class imbalance inherent in financial risk warnings, we adopt the conventional setting of as the optimal operating point.
4. Experimental Settings
4.1. Sample Selection
Corporate financial risk manifests through insufficient cash flow accompanied by surging accounts receivable, inability to service debts, and unsustainable dividend distributions. Prevailing studies on Chinese-listed firms predominantly employ matched-pair designs pairing ST (Special Treatment) and non-ST companies at a 1:1 ratio, which partially mitigates omitted-variable bias yet inherently reduces sample size, compromises generalizability, and introduces pairwise conditional expectations—where risk assessments become relative to matched counterparts. This study analyzes Shanghai/Shenzhen A-share-listed companies from 2000 to 2020, excluding financial sector entities and firms with excessive missing values, yielding a final sample of 2826 enterprises. Under China’s securities regulatory framework, companies typically exhibit financial distress indicators in the audited financial statements of the year preceding their Special Treatment (ST) designation or risk event occurrence (e.g., T−1 for T-year events). These pre-crisis financial data, having undergone rigorous audit procedures, provide reliable representations of the firm’s actual financial condition prior to risk materialization. The methodological approach aligns with established practices in financial risk warning prediction literature [], where lagged financial variables are systematically employed to capture transitional financial trajectories. This temporal analytical framework particularly enhances the explanatory power for China’s ST mechanism, where regulatory interventions often follow observable financial deterioration patterns. Therefore, our paper prioritizes T−1 financial metrics to capture predictive signals and financial trajectory shifts.
Building upon extant literature, we select 46 indicators spanning solvency, operational efficiency, profitability, growth potential, per-share metrics, and relative valuation to quantify financial health (detailed in Table 1), with Special Treatment (ST) designation serving as the financial distress criterion. Missing values are handled through median imputation for continuous variables and mode imputation for categorical variables, with missing indicator flags created when >5% of data is absent. Firms receiving ST in year T are labeled ‘1’, others ‘0’, utilizing exclusively CSMAR Database-sourced data (https://data.csmar.com/).
Table 1.
Financial indicators of Chinese-listed companies.
4.2. Implementation Details of the Proposed Method
To obtain the optimal results of financial risk early warning models, we fine-tune the hyper-parameters of financial risk early warning models by grid search methods. In the fine-tuning process of the proposed method, since the proposed method considers LightGBM as base learners, we first fine-tune the hyper-parameters of LightGBM for the proposed method. The detailed hyper-parameter initialization and fine-tune stride of each hyper-parameter are shown in Table 2.
Table 2.
Parameter settings for the proposed method.
4.3. Evaluation Metrics
Financial risk early warning is a typical two-class classification problem, which can be modeled as a two-class classification task. As for the two-class classification problem, the original class is ST (positive) and non-ST (negative); according to the real state of the sample and the possible types of the predicted results, the financial risk early warning can be modeled as a two-class classification task, and we can arrange and combine the predictions to obtain four results, as shown in the following Table 3.
Table 3.
Confusion matrix of prediction results.
The confusion matrix is an error matrix, which is used to evaluate the performance of the supervised learning algorithm. In this paper, based on the confusion matrix of binary classification, the evaluation indices are introduced. According to the confusion matrix shown in Table 3, we can divide the predictions into four categories: true-positive (TP), false-positive (FP), false-negative (FN), and true-negative (TN).
This paper starts from the confusion matrix to introduce various model evaluation indicators, such as AUC value, precision, recall, F1, ROC curve, and BS.
- (1)
- Precision (Pre)
The indicator is the percentage of the sample that the model correctly predicted the financial crisis. The indicator measures the accuracy of warnings for the sample that predicted the financial crisis. Its calculation formula is
- (2)
- Recall (Rec)
The recall is the percentage of the sample that was correctly predicted by the model, of all the samples that had a financial crisis, and it measures the percentage of the sample that was correctly predicted by the model. The specific formula is as follows:
- (3)
- F1-score (F1)
F1 is the harmonic mean between precision and recall. The value range of F1 is [0, 1]. The formula for F1 is
- (4)
- Brier Score (BS)
The Brier score describes the average error between the predicted result and the actual state. The smaller the score, the better the model. It can be calculated as
where represents the prediction probability of the -th sample, is the true label value of the -th sample, and is the number of samples.
- (5)
- ROC curve and AUC
The False-Positive Rate (FPR) represents the probability of incorrectly predicting non-ST samples as ST samples, and is used as the measurement metric of the model’s early warning efficiency. Its formula is
The True-Positive Rate (TPR) is the percentage of ST samples that are predicted to be ST, also known as ST sample coverage, denoted by Sensitivity. The formula is
The ROC curve is used to evaluate the quality of the model intuitively. The ROC curve is a curve with the TPR as the ordinate and the FPR as the abscissa. A steeper curve indicates that a higher TPR can be achieved at a lower FPR. According to the change in the threshold, the FPR and TPR of the model change accordingly, forming an ROC curve.
The AUC value is the area under the ROC curve, which is a very good quantifiable evaluation criterion derived from the ROC. The steeper the ROC curve, the better, that is, the larger the AUC, the better. Normally, the AUC value ranges from 0.5 to 1.0, with a larger AUC representing better performance. When the AUC is above 0.8, the model is generally acceptable. AUC is a metric used to evaluate the quality of a classification model.
5. Experimental Results
5.1. Performance Comparison and Analysis
To validate the effectiveness of the proposed algorithm, we initially selected a series of traditional risk early warning models for preliminary comparison, including statistical methods such as LAD and LR, single-machine-learning classifiers like DT, SVM, and NN, classic ensemble-learning techniques such as AdaBoost and RF, as well as advanced ensemble algorithms like XGBoost and LightGBM. As shown in Figure 1, the ROC curves of all risk early warning models lie above the random-guess line, demonstrating the effectiveness of all models and indicating that their overall predictive performance outperforms random guessing. Specifically, Figure 1 reveals that DT performs poorly on this dataset, with its area under the curve (AUC) being smaller than that of other risk early warning models. Furthermore, a comparison between the ROC curves of single-machine-learning classifiers and ensemble algorithms shows that the ensemble algorithms achieve significantly larger AUC values on this dataset compared to single-machine-learning classifiers and traditional statistical methods. Finally, as illustrated in Figure 1, the proposed model in this study exhibits a slightly larger AUC than the advanced ensemble models XGBoost and LightGBM, further validating the effectiveness of the proposed model. All the curves are obtained by 50 times repeated running based on 10-fold cross-validation, with each fold maintaining an 80:20 train-test split ratio to ensure balanced representation across all risk categories.
Figure 1.
ROC comparison of various financial risk prediction models.
Table 4 presents the performance of various algorithms on the risk early warning dataset. As shown in Table 4, many single-machine-learning classifiers, such as DT, KNN, and SVM, achieved precision and recall scores of 0 on the risk early warning dataset. This is primarily due to the highly imbalanced nature of the dataset, where the number of ST samples in the test set was limited. The low complexity of single-machine-learning classifiers resulted in poor discrimination between ST and non-ST companies, causing the models to predict predominantly non-ST samples, leading to a correct prediction rate of 0 for ST samples. Compared to single-machine-learning classifiers, traditional statistical risk early warning models performed better, but their precision and recall values remained low. Such limitations could be mitigated through ensemble-learning strategies. As demonstrated in Table 4, ensemble algorithms like AdaBoost, RF, XGBoost, and LightGBM significantly improved AUC values, effectively enhancing precision and recall. In other words, compared to single-risk early warning models, ensemble algorithms improved their predictive capability for ST companies and enhanced prediction performance. Further comparisons revealed that our algorithm achieved optimal performance across metrics such as precision, AUC, and F1-scores, further validating the feasibility of the proposed cascading ensemble algorithm for risk early warning tasks. Additionally, examining the BS scores of single models like DT, SVM, and KNN indicated that single models held a leading advantage in BS metrics. However, their precision and recall scores were 0, as the dataset predominantly consisted of non-ST samples. Single models effectively identified non-ST samples, resulting in smaller overall errors on the risk early warning dataset, yet they exhibited larger overall errors in predicting ST samples.
Table 4.
Performance comparison of different classification methods.
The superior performance of the proposed model stems from its further cascading on the basis of ensemble strategies, making the proposed framework an “ensemble of ensembles” algorithm. This cascading mechanism not only provides better robustness but also exhibits strong scalability. To further highlight the effectiveness of the proposed cascading structure, we conducted a comparative analysis of the performance under different base learners, with the results presented in Table 5. As shown in Table 5, we employed multiple risk early warning models as base learners in the cascading framework, including LR, LDA, KNN, RF, AdaBoost, XGBoost, and LightGBM. Table 5 reveals that compared to single-machine-learning classifiers, the cascading ensemble models based on traditional statistical methods experienced a decline in AUC. This is because, at each level of the cascading framework, the decisions of the four base learners are made in parallel, and the final performance is the average result of these base learners. Weak base learners at the same level may degrade the overall performance of the cascading ensemble model. In contrast, tree-based cascading ensemble models demonstrated significant performance improvements. Combining the data from Table 4 and Table 5, in terms of AUC, the cascading decision tree model improved by 6.77% compared to the decision tree algorithm, the cascading random forest model improved by 0.31% compared to the random forest model, the cascading AdaBoost improved by 1.73% compared to AdaBoost, the cascading XGBoost improved by 0.319% compared to the XGBoost algorithm, and the cascading LightGBM improved by 0.593% compared to LightGBM. From the BS metric perspective, cascading LightGBM achieved the best BS score, indicating that it could effectively reduce modeling errors in risk early warning tasks. Additionally, integrating Table 4 and Table 5 demonstrates that robust base learners are fundamental to the high performance of cascading ensemble models. To achieve better risk early warning performance, more robust base learners should be the preferred choice for cascading ensemble mechanisms, as weak base learners may lead to performance degradation.
Table 5.
Comparison of prediction performance under different base learner cascade mechanisms.
5.2. Sensitivity Analysis
The proposed algorithm is essentially an ensemble, as demonstrated in the cascading ensemble framework, where RF serves as the base learner. Since RF itself is an ensemble of decision trees and the proposed algorithm further ensembles RF, the model can be regarded as an “ensemble of ensembles” algorithm. In Table 4, we summarize the impact of base learner selection on the final model performance. RF, XGBoost, and LightGBM, as representatives of classic and advanced ensemble algorithms, serve as base learners for the proposed model, ensuring high-precision financial risk early warning. RF, XGBoost, and LightGBM themselves are tree-based ensemble algorithms, with RF being a bagging-based algorithm and XGBoost and LightGBM being boosting-based algorithms. As representatives of tree-based ensemble algorithms and base learners for the proposed algorithm, the number of trees in these algorithms is a critical parameter affecting model performance. To investigate the influence of parameters such as the number of trees and the complexity of ensemble base learners on model performance, we further analyzed the model’s performance. Figure 2 illustrates the sensitivity analysis of the model under different parameter settings, with Figure 2a showing the performance curve of LightGBM composed of varying numbers of decision trees and Figure 2b displaying the changes in the AUC metric as the number of leaf nodes in the LightGBM base learner varies. Figure 2a,b indicates that parameters such as the number of trees and leaf nodes in each LightGBM significantly impact the final model performance. A smaller number of decision trees or leaf nodes may lead to underfitting, while excessively increasing these parameters increases model complexity and may result in overfitting. Therefore, careful optimization of these parameters is necessary. On this risk early warning dataset, the optimal AUC is achieved when the number of decision trees is 50 and the number of leaf nodes per regression tree is limited to 70.
Figure 2.
Comparison of parameter sensitivity. (a) Performance comparison of cascade models with different numbers of trees. (b) Performance of cascade models with different numbers of leaf nodes.
To further validate the model’s reliability, this study investigates its detailed performance metrics under varying proportions of training data. As shown in Table 6, the analysis of indicators across different training set proportions reveals a declining trend in AUC as the training data size decreases, indicating a reduction in the model’s discrimination capacity for ST companies. The comparison of BS metrics suggests that increasing the training set proportion or collecting more risk early warning data (financial indicators of ST/non-ST companies) is a viable approach to mitigate the discrepancy between predicted values and true labels. Furthermore, the decline in training data proportion leads to simultaneous degradation in precision and recall, reflecting diminished predictive performance for both ST and non-ST companies, ultimately resulting in a progressively decreasing F1-score. Most notably, the model’s robustness in handling small datasets is enhanced by the bagging algorithm incorporated during the cascading process, which boosts the diversity of each base learner. Remarkably, even with a 30% reduction in training data, the AUC decline remains below 1%, demonstrating sufficient resilience.
Table 6.
Comparison of the comprehensive performance of the models under different training set ratios.
6. Discussion
6.1. Limitations
While the proposed ensemble model demonstrates improved robustness through bagging and enhanced generalization via cascade integration, four fundamental limitations require urgent attention in future research. First, the exclusive reliance on financial indicators neglects critical non-financial dimensions like corporate governance structures (e.g., board independence, executive compensation) and external audit opinions, which have proven predictive value in international studies. Second, the geographical restriction to Chinese-listed companies limits the model’s generalizability, particularly given the unique regulatory environment of China’s capital markets. Third, despite algorithmic improvements, key performance metrics (especially recall and F1-scores for minority classes) remain suboptimal for practical deployment, suggesting underlying feature representation or optimization challenges. Most critically, the severe class imbalance (ST/non-ST ratio of 1:31) demands systematic solutions beyond current ensemble techniques—a gap where advanced hybrid sampling strategies (e.g., SMOTE-ENN) or meta-cost-learning frameworks could yield significant improvements.
6.2. Future Research
Future research should prioritize (1) developing multi-modal frameworks integrating financial metrics with ESG factors and governance attributes, (2) conducting cross-market validation using datasets from developed and emerging economies, (3) designing specialized loss functions or attention mechanisms to boost minority class recognition, and (4) establishing standardized evaluation protocols for imbalanced financial prediction tasks. Particularly promising is the exploration of dynamic imbalance adaptation techniques that automatically adjust resampling ratios or cost matrices based on real-time data distributions—an innovation that could bridge the gap between academic metrics and operational requirements in risk management. Meanwhile, systematic comparisons with other state-of-the-art models will further demonstrate the proposed framework’s advantages in terms of generalization capability and operational efficiency, providing valuable benchmarks for both academic research and industry applications. Additionally, future studies should incorporate SHAP (SHapley Additive exPlanations) analysis or other explainable methods to enhance model interpretability, providing insights into feature contributions and decision-making processes. This addition will complement our performance-driven approach with transparent, actionable explanations for stakeholders.
7. Conclusions
In recent years, the gradual maturity of artificial intelligence theory has shifted risk early warning systems toward machine-learning algorithms and deep-learning techniques. Notably, the introduction of ensemble methods has significantly enhanced the robustness of risk early warning models. This study proposed a novel financial risk early warning model based on 46 financial indicators from 2826 A-share-listed companies on the Shanghai and Shenzhen Stock Exchanges in China from 2000 to 2020. In this research, the advanced tree-based ensemble algorithm LightGBM demonstrated high efficiency and precision in financial risk early warning modeling. The proposed algorithm further ensembles LightGBM, leveraging advanced ensemble techniques to enhance enterprise risk early warning capabilities. Additionally, the algorithm employed a cascading approach to increase model complexity, enabling it to adapt to scenarios with insufficient financial data and demonstrating strong scalability. Furthermore, the algorithm improved its generalization performance in risk early warning tasks by leveraging the robustness of tree models and bagging strategies. Experimental results on the financial risk early warning dataset of Chinese-listed companies showed that the proposed highly integrated algorithm effectively outperformed single-machine-learning-classifier models. The AUC and BS values of different base learners in the cascading ensemble indicated that strong base learners are better suited for cascading integration mechanisms, with cascading LightGBM achieving the optimal BS value and effectively reducing modeling errors. Moreover, the robustness of tree-based algorithms and bagging techniques promotes the diversity of base learners, making the proposed algorithm a risk early warning solution that achieves good performance without requiring extensive hyper-parameter tuning. Empirical results demonstrated that the model reached optimal AUC when the number of decision trees was 50 and the number of leaf nodes per tree was limited to 70. Finally, under varying proportions of training data, the cascading ensemble algorithm showed declining trends in AUC, precision, recall, and F1-scores, while the BS value increased. However, the bagging algorithm enhances base learner diversity, limiting the decline in the cascading ensemble model’s AUC value and highlighting its advantage in handling small datasets.
The conclusions of this study offer valuable insights for corporate management and other stakeholders. For business operators, the proposed financial risk early warning model enables them to predict the company’s financial condition for the coming year, allowing for timely and accurate assessment of financial health, facilitating early detection of financial risks, and implementing appropriate measures to avert potential crises. Financial institutions such as banks and bondholders, as well as related investors, can adjust their investment strategies and resource allocations based on the risk early warning results to avoid investment pitfalls. Regulatory bodies like the Securities and Exchange Commission can proactively intervene in listed companies using these early warning signals to prevent chain reactions that could jeopardize the entire financial market. Additionally, other corporate stakeholders, including employees, external customers, and suppliers, can optimize their decision-making based on the financial risk early warning outcomes to maximize their own interests.
Author Contributions
Conceptualization, Y.Z. and Y.Y.; Methodology, Y.Z.; Software, Y.Z.; Validation, C.Z.; Formal Analysis, C.Y. and Y.Z.; Resources, Y.Y.; Data Curation, Y.Z.; Writing—Original Draft Preparation, Y.Z.; Writing—Review and Editing, Y.Y. and Y.Z.; Supervision, C.Z.; Project Administration, Y.Z. and Y.Y.; Funding Acquisition, Y.Y. and Y.Z. All authors have read and agreed to the published version of the manuscript.
Funding
The Project was supported by Open Research Grant of Joint National-Local Engineering Research Centre for Safe and Precise Coal Mining (Grant NO. EC2024016) and Training Service of Shanghai Pinshang Internet Technology Center (Grant NO. 2025HX268).
Data Availability Statement
The datasets generated and/or analyzed during the current study are available from the first author upon reasonable request.
Conflicts of Interest
The authors declare no conflicts of interest. The authors declare that this study received funding from Training Service of Shanghai Pinshang Internet Technology. The funder was not involved in the study design, collection, analysis, interpretation of data, the writing of this article or the decision to submit it for publication.
References
- Wanke, P.; Barros, C.P.; Faria, J.R. Financial Distress Drivers in Brazilian Banks: A Dynamic Slacks Approach. Eur. J. Oper. Res. 2015, 240, 258–268. [Google Scholar] [CrossRef]
- Altman, E.I. Financial Ratios, Discriminant Analysis and the Prediction of Corporate Bankruptcy. J. Financ. 1968, 23, 589–609. [Google Scholar] [CrossRef]
- Ohlson, J.A. Financial Ratios and the Probabilistic Prediction of Bankruptcy. J. Account. Res. 1980, 18, 109–131. [Google Scholar] [CrossRef]
- West, R.C. A Factor-Analytic Approach to Bank Condition. J. Bank. Financ. 1985, 9, 253–266. [Google Scholar] [CrossRef]
- Chen, N.; Ribeiro, B.; Chen, A. Financial Credit Risk Assessment: A Recent Review. Artif. Intell. Rev. 2016, 45, 1–23. [Google Scholar] [CrossRef]
- Li, X.; Wang, J.; Yang, C. Risk Prediction in Financial Management of Listed Companies Based on Optimized BP Neural Network under Digital Economy. Neural Comput. Appl. 2023, 35, 2045–2058. [Google Scholar] [CrossRef]
- Shin, K.-S.; Lee, T.S.; Kim, H. An Application of Support Vector Machines in Bankruptcy Prediction Model. Expert Syst. Appl. 2005, 28, 127–135. [Google Scholar] [CrossRef]
- Olson, D.L.; Delen, D.; Meng, Y. Comparative Analysis of Data Mining Methods for Bankruptcy Prediction. Decis. Support Syst. 2012, 52, 464–473. [Google Scholar] [CrossRef]
- Wolpert, D.H. The Lack of a Priori Distinctions between Learning Algorithms. Neural Comput. 1996, 8, 1341–1390. [Google Scholar] [CrossRef]
- Beaver, W.H. Financial Ratios as Predictors of Failure. J. Account. Res. 1966, 4, 71–111. [Google Scholar] [CrossRef]
- Martin, D. Early Warning of Bank Failure: A Logit Regression Approach. J. Bank. Financ. 1977, 1, 249–276. [Google Scholar] [CrossRef]
- Jones, S.; Hensher, D.A. Predicting Firm Financial Distress: A Mixed Logit Model. Account. Rev. 2004, 79, 1011–1038. [Google Scholar] [CrossRef]
- Odom, M.D.; Sharda, R. A Neural Network Model for Bankruptcy Prediction. In Proceedings of the 1990 IJCNN International Joint Conference on Neural Networks, San Diego, CA, USA, 17–21 June 1990; IEEE: Piscataway, NJ, USA, 1990; pp. 163–168. [Google Scholar]
- Wu, D.; Ma, X.; Olson, D.L. Financial Distress Prediction Using Integrated Z-Score and Multilayer Perceptron Neural Networks. Decis. Support Syst. 2022, 159, 113814. [Google Scholar] [CrossRef]
- Chen, Y.; Guo, J.; Huang, J.; Lin, B. A Novel Method for Financial Distress Prediction Based on Sparse Neural Networks with L 1/2 Regularization. Int. J. Mach. Learn. Cybern. 2022, 13, 2089–2103. [Google Scholar] [CrossRef]
- Chen, M.-Y. Predicting Corporate Financial Distress Based on Integration of Decision Tree Classification and Logistic Regression. Expert Syst. Appl. 2011, 38, 11261–11272. [Google Scholar] [CrossRef]
- Xie, C.; Luo, C.; Yu, X. Financial Distress Prediction Based on SVM and MDA Methods: The Case of Chinese Listed Companies. Qual. Quant. 2011, 45, 671–686. [Google Scholar] [CrossRef]
- Xu, W.; Xiao, Z.; Yang, D.; Yang, X. A Novel Nonlinear Integrated Forecasting Model of Logistic Regression and Support Vector Machine for Business Failure Prediction with All Sample Sizes. J. Test. Eval. 2015, 43, 681–693. [Google Scholar] [CrossRef]
- Sun, Y.; Li, Z.; Li, X.; Zhang, J. Classifier Selection and Ensemble Model for Multi-Class Imbalance Learning in Education Grants Prediction. Appl. Artif. Intell. 2021, 35, 290–303. [Google Scholar] [CrossRef]
- Freund, Y.; Schapire, R.; Abe, N. A Short Introduction to Boosting. J. Soc. Artif. Intell. 1999, 14, 1612. [Google Scholar]
- Chen, T.; Guestrin, C. Xgboost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
- Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.-Y. Lightgbm: A Highly Efficient Gradient Boosting Decision Tree. Adv. Neural Inf. Process. Syst. 2017, 30, 3149–3157. [Google Scholar]
- Heo, J.; Yang, J.Y. AdaBoost Based Bankruptcy Forecasting of Korean Construction Companies. Appl. Soft Comput. 2014, 24, 494–499. [Google Scholar] [CrossRef]
- Yao, G.; Hu, X.; Song, P.; Zhou, T.; Zhang, Y.; Yasir, A.; Luo, S. AdaFNDFS: An AdaBoost Ensemble Model with Fast Nondominated Feature Selection for Predicting Enterprise Credit Risk in the Supply Chain. Int. J. Intell. Syst. 2024, 2024, 5529847. [Google Scholar] [CrossRef]
- Zhang, Y.; Chen, L. A Study on Forecasting the Default Risk of Bond Based on Xgboost Algorithm and Over-Sampling Method. Theor. Econ. Lett. 2021, 11, 258–267. [Google Scholar] [CrossRef]
- Wang, W.; Liang, Z. Financial Distress Early Warning for Chinese Enterprises from a Systemic Risk Perspective: Based on the Adaptive Weighted XGBoost-Bagging Model. Systems 2024, 12, 65. [Google Scholar] [CrossRef]
- Tan, B.; Gan, Z.; Wu, Y. The Measurement and Early Warning of Daily Financial Stability Index Based on XGBoost and SHAP: Evidence from China. Expert Syst. Appl. 2023, 227, 120375. [Google Scholar] [CrossRef]
- Huang, C.; Cai, Y.; Cao, J.; Deng, Y. Stock Complex Networks Based on the GA-LightGBM Model: The Prediction of Firm Performance. Inf. Sci. 2025, 700, 121824. [Google Scholar] [CrossRef]
- Zhu, W.; Zhang, T.; Wu, Y.; Li, S.; Li, Z. Research on Optimization of an Enterprise Financial Risk Early Warning Method Based on the DS-RF Model. Int. Rev. Financ. Anal. 2022, 81, 102140. [Google Scholar] [CrossRef]
- Liang, D.; Tsai, C.-F.; Lu, H.-Y.R.; Chang, L.-S. Combining Corporate Governance Indicators with Stacking Ensembles for Financial Distress Prediction. J. Bus. Res. 2020, 120, 137–146. [Google Scholar] [CrossRef]
- Chen, X.; Wu, C.; Zhang, Z.; Liu, J. Multi-Class Financial Distress Prediction Based on Stacking Ensemble Method. Int. J. Financ. Econ. 2025, 30, 2369–2388. [Google Scholar] [CrossRef]
- Wang, S.; Chi, G. Cost-Sensitive Stacking Ensemble Learning for Company Financial Distress Prediction. Expert Syst. Appl. 2024, 255, 124525. [Google Scholar] [CrossRef]
- Nguyen, M.; Nguyen, B.; Liêu, M. Corporate Financial Distress Prediction in a Transition Economy. J. Forecast. 2024, 43, 3128–3160. [Google Scholar] [CrossRef]
- Ekinci, A.; Sen, S. Forecasting Bank Failure in the US: A Cost-Sensitive Approach. Comput. Econ. 2024, 64, 3161–3179. [Google Scholar] [CrossRef]
- Lee, M.-J.; Choi, S.-Y. The Impact of Financial Statement Indicators on Bank Credit Ratings: Insights from Machine Learning and SHAP Techniques. Financ. Res. Lett. 2025, 85, 107758. [Google Scholar] [CrossRef]
- Zhang, J.; Zhao, Z. Corporate ESG Rating Prediction Based on XGBoost-SHAP Interpretable Machine Learning Model. Expert Syst. Appl. 2026, 295, 128809. [Google Scholar] [CrossRef]
- He, K.; Yang, Q.; Ji, L.; Pan, J.; Zou, Y. Financial Time Series Forecasting with the Deep Learning Ensemble Model. Mathematics 2023, 11, 1054. [Google Scholar] [CrossRef]
- Mochurad, L.; Dereviannyi, A. An Ensemble Approach Integrating LSTM and ARIMA Models for Enhanced Financial Market Predictions. R. Soc. Open Sci. 2024, 11, 240699. [Google Scholar] [CrossRef]
- Ding, Y.; Song, X.; Zen, Y. Forecasting Financial Condition of Chinese Listed Companies Based on Support Vector Machine. Expert Syst. Appl. 2008, 34, 3081–3089. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).