A Machine Learning Approach for Investigating the Determinants of Stock Price Crash Risk: Exploiting Firm and CEO Characteristics

: This study uses machine learning to investigate the effects of firm and CEO characteristics on stock price crash risk by collecting massive data on publicly listed firms in China. The results show that eXtreme Gradient Boosting (XGBoost) is the most effective model for predicting stock price crash risk, with relatively satisfactory performance. Meanwhile, the SHapley Additive exPlanations (SHAP) method is used to interpret the importance of features. The results show that the average weekly return of a firm over a year (RET) contributes the most and is negatively associated with crash risk, followed by Sigma, IPO age, and firm size. We also found that, among CEO characteristics, CEO pay contributes substantially to crash risk at the firm level. Our findings have important implications for research into the impact of firm and CEO characteristics on stock price crash risk and provide a novel way for investors to plan their investment decisions and risk-taking behavior rationally.


Introduction
Stock price crash is undoubtedly a disaster for society and investors, particularly retail investors who concentrate their money on a handful of firms; a stock price crash in a portfolio reduces their wealth [1].The severe economic losses caused by stock price crash have prompted extensive research into the internal formation mechanisms of stock price crash risk.Because of information opacity and asymmetry, managers can conceal bad news to keep their high salaries [2,3].When bad news accumulates above a certain threshold, it is immediately disclosed to the market, resulting in a significant decrease in stock price and company reputation [4].
Firm characteristics influence stock price movement and corporate risk-taking, as is common in risk-related research [5].For instance, Deng et al. [6] proposed that stock price crash risk is related to several firm characteristics, including cash flow, operating capacity, debt-paying ability, growth potential, and profitability.They also suggested that we could use machine learning techniques to find each factor's effects and feature importance.Large companies are likelier to experience a stock price crash because they imply discretionary disclosure [4,7].However, other studies have stated that a stock price crash occurs when the "bad news hoarding" phenomenon accumulates and reaches a critical value, at which point the bad news floods the stock market without warning [3,8,9].Therefore, we require a systematic investigation of the influence of firm characteristics to accurately reflect each determinant's impact.
Previous studies have also identified CEO characteristics as key factors influencing firm-specific stock price crash risk [10][11][12][13].For example, a younger CEO early in his career has the incentive to defer bad news for a string of consecutive earnings, increasing the likelihood of a crash risk [10].Furthermore, a CEO with a greater position of power can withhold bad news for financial gain [14], resulting in stock price crashes.In addition, an overconfident CEO with poor management skills is likelier to overstate returns and ignore bad news, increasing crash risk [15].According to Habib,Hasan,and Jiang [16], female CEOs positively impact stock price crash risk, but the relationship varies depending on whether the female CEO is in her first appointment.Various aspects of CEO characteristics can influence crash risk, so a comprehensive investigation into how CEO characteristics contribute to crashes is urgently needed.
Understanding the causes of stock price crashes is critical for instructing investors on how to protect shareholder value and reduce wealth losses.The large number of firms in the Chinese market across a wide range of industries and sectors allows for the collection of a large amount of data, which helps to analyze and predict the risk of stock crashes more accurately.Thus, this study uses data from Chinese listed companies from 2010 to 2020 to provide comprehensive information on the factors that influence crash risk in the Chinese stock market and explore how firms and CEO characteristics affect crash risk using machine learning algorithms.Unlike traditional analytical tools, machine learning methods can precisely analyze large and complex datasets and produce convincing results [17].
Our study contributes to two areas of risk management research.First, we developed a novel stock price crash determinants model by combining firm and CEO characteristics to provide a new perspective on crash risk research.Previous studies in this field have focused solely on the impact of firm or CEO characteristics [6,14,15], but this research combined the two.Second, our study contributes to revealing important rankings in firm and CEO characteristics and finding specific relationships between factors and crash risk.These relationships have empirical implications for increasing the detectability of firm-specific stock price crashes and improving stock market regulation.
The remainder of this study is structured as follows.Section 2 shows the data source and measurements, and Section 3 presents the analysis results.Section 4 discusses these findings.Finally, Section 5 concludes the paper by discussing the study's contributions and limitations.

Research Methodology
Using machine learning algorithms, this study analyzed crash risk based on firm and the CEO characteristics.Machine learning can process a wide range of complex data, and its superior accuracy and explanatory power have made it a popular method for prediction and analysis [18][19][20][21].In addition to its ability to identify long-term and delicate temporal patterns that are difficult for human analysts to detect, machine learning is particularly effective at modeling nonlinear behavior in financial data and accurately predicting the interaction effects of leading indicators of financial volatility [22].Many scholars have shown that combining advanced deep learning and machine learning techniques is the best approach for financial forecast performance [23,24].Recognizing the value of machine learning, we used 11 machine learning methods to investigate the relationships between firm characteristics and crash risk, including ridge regression, least absolute shrinkage and selection operator (Lasso), elastic net, multilayer perceptron regressor (MLPRegressor), decision tree, bagging, random forest, Extra-Trees, adaptive boosting (AdaBoost), Gradient Boosting Decision Tree (GBDT), and eXtreme Gradient Boosting (XGBoost).

Data and Sample
Our data cover listed companies in China from 2010 to 2020.The Chinese market, which has long been the world's second largest and most dynamic market, has significantly affected the global market.The Chinese stock market has a short history, and its laws and regulatory systems are insufficient.Thus, relying on Chinese stock market data is reasonable.We used the Choice dataset to collect firm characteristics and stock returns, and the China Stock Market and Accounting Research (CSMAR) database to obtain CEO characteristics.Then, we matched the firm characteristics data, CEO characteristics data, and stock return data using the same stock code and fiscal year.Furthermore, we winsorized all variables at 1% and 99% to account for potential bias, as extreme values affect the accuracy of the analyzed results.To accurately measure the crash risk, we excluded observations with missing values and those with fewer than weeks of stock returns.The final sample source included 1,999 firms (11,915 firm-year observations) from 2010 to 2020.Table 1 shows companies' frequency and percentage distributions across the industries examined in this study.The dataset covers a wide range of industries from computer communication to production and supply of electric power and heat power, highlighting the diversity and intricacy of these industries.

Measuring Stock Crash Risk
Following Chen et al. [7], we used negative conditional return skewness (NCSKEW) and down-to-up volatility (DUVOL) in our study to estimate stock price crash risk.We used weekly return data to estimate the weekly return of each firm using the following regression model [4]: where R i,w is the stock return of firm i in week w, R i,w is weighted average return, and ε i,w is the bias term, representing the parts of stock returns that do not relate to market returns.We also used two lead and two lag terms to alleviate potential problems that result in asynchronous stock trading [25].We measured W j,t as the natural logarithm of the residual (ε i,w ) plus 1. NCSKEW, the negative conditional return skewness of a specific firm's weekly returns, is the first indicator of a stock price crash risk.We computed NCSKEW using the following model: where n is the number of observations of daily returns for firm j in year t.W j,t is the weekly returns of a specific firm, measured as the natural logarithm of their residual plus 1.An increasing risk of stock price crash exists for a firm with a higher NCSKEW.DUVOL, the weekly return down-to-up volatility, is the second indicator of a stock price crash risk.We calculated DUVOL with the following model: where n u is the number of up weeks and n d is the number of down weeks.Specifically, we divided the total weeks into up weeks and down weeks.Furthermore, we calculated the standard deviations of the subsamples and found that firms with a higher DUVOL have an increased risk of stock price crash.

Measuring Determinants
We can divide the factors that influence crash risk into two categories: firm characteristics and CEO characteristics.The former includes firm age, IPO age, LogSize, leverage, goodwill, brand capital, cash, return on assets (ROA), return on equity (ROE), sigma, RET, and DTURN.The latter includes the CEO's gender, age, education, MBA, duality, tenure, pay, shareholdings, board experience, academic experience, and overseas experience.We used previous studies to assess firm and CEO characteristics [10,19,21,23,24,[26][27][28].Table 2 shows a detailed description of these variables.

Evaluation Criterion
We used MSE to evaluate the model performance, computed as follows: where Y i is the true value of NCSKEW or DUVOL, and Y i is the predicting value.MSE is a good indicator for evaluating machine learning models [29,30].Thus, we employ MSE to compare our 11 machine learning models.

Model Evaluation and Comparison
This study trained machine learning models such as ridge regression, Lasso, elastic net, MLPRegressor, decision tree, bagging, random forest, Extra-Trees, AdaBoost, GBDT, and XGBoost on 80% of the data.We used the remaining 20% to assess the model.We evaluated the model using five-fold cross-validation, which divides the dataset into five subsamples.The model extracts four subsamples to train models, and the remaining subsample serves as a test set to evaluate the model.Finally, we constructed and evaluated five models using a different test dataset.In addition, we chose MSE to evaluate the model.Models with higher MSE values perform poorly.Next, we compared the models' performance using MSE as the measure.We found that the MSE value to measure crash risk in the XGBoost was 0.4557 using NCSKEW and 0.2186 using DUVOL, which were slightly higher than those in ridge regression, Lasso, bagging, Extra-Trees, and GBDT.However, the XGBoost results outperformed the elastic net, MLPRegressor, and decision tree models.Although the MSE value of AdaBoost was slightly lower than that of XGBoost when measured using NCSKEW, the mean value of the five-fold cross-validation in XGBoost was lower than that of AdaBoost, indicating that XGBoost has greater stability than AdaBoost.The results are summarized in Table 3, and show that XGBoost is the best model to explore the determinants of crash risk.

Model Interpretation
Our study interpreted the model results using Lundberg and Lee's [31] proposed SHapley Additive exPlanation (SHAP) model.SHAP indicates the contributions of each variable using game theory.We calculate SHAP values as follows: where i is a feature that we need to interpret.N is the set of all features we input.M is the number of features that we need to interpret, and S is the subset of N. f x is the predicted result of x in the models.f x (S) indicates the predicted result of the set of S, and f x (S {i}) is the predicted result with the set of S adding i. Feature importance is critical for determining which features contribute the most to the model's performance.This study used SHAP to estimate the model by determining each factor's importance and the feature's specific impact.Table 4 shows the results of the SHAP summary analysis.Using the summary table, we calculated the mean absolute SHAP value, which reflected the contributions of each variable's characteristics.Furthermore, the table efficiently conveys the effects of the variables on the model.Finally, we calculated the average feature importance rank because NCSKEW produces results that differ from those of DUVOL.Note: SHAP1, Effects1, and Rank1 are the results using NCSKEW as a measure and SHAP2, Effects2, and Rank2 are the results using DUVOL.Rank is the rank of feature importance on average.
The SHAP summary table shows that the eight most important firm characteristics are RET, sigma, IPO age, LogSize, DTURN, brand capital, leverage, and cash.In addition, the top eight CEO characteristics include CEO pay, CEO accounting, CEO shareholdings, CEO age, CEO marketing, CEO tenure, CEO education, and CEO academic experience.The SHAP summary table displays the specific impact of features on the risk of stock price crash.Crash risk correlates negatively with RET, sigma, IPO age, LogSize, DTURN, brand capital, CEO pay, cash, ROE, firm age, CEO marketing, CEO education, CEO RD, CEO design, CEO production, and CEO overseas experience.Meanwhile, a positive relationship exists between crash risk and leverage, goodwill, ROA, CEO accounting, CEO age, CEO tenure, CEO academic experience, CEO board experience, CEO finance, CEO HRM, and CEO MBA.However, using NCSKEW to estimate the impact of CEO duality and CEO gender yields different results than using DUVOL.

Discussion
Intense fluctuations in stock prices have created uncertainty in investors' reactions and behaviors, as well as in the daily operations of companies, increasing the need to manage stock price crash risk for companies' long-term development.Given that China is the world's largest developing country, with a huge consumer base and diverse market demand, there is significant uncertainty about stock market changes.Based on the background of the Chinese market, we used 11 machine learning methods to explore the influential factors of crash risk using firm and CEO characteristics from a large-scale sample of Chinese listed companies.This section discusses the study's significant theoretical and practical implications.
This study contributes to the literature on stock price crash risk by developing a stock price crash determinants model using machine learning techniques considering firm and CEO characteristics.Although much previous literature has studied the relationship between firm and CEO characteristics and crash risk, most research has not explored the joint influence and influential degree of large firm and CEO characteristics [4,6].As a result, the complexity of analyzing the determinants of crash risk has increased due to the presence of multiple features at once.However, our study fills the gap by applying machine learning methods to explore the factors that significantly impact stock price crash risk.
This study provides evidence of how firm and CEO characteristics impact crash risk, including firm characteristics (i.e., RET, sigma, and firm age) and CEO characteristics (i.e., CEO pay, CEO accounting, and CEO shareholdings).Extending Zhang et al.'s [32] study with the SHAP method, we found that RET, or the weekly return of a firm over a year, is the most significant factor among these, negatively impacting crash risk.However, unlike Xu and Zou [33], who argued that the relationship between CEO pay and stock price crash risk is unclear, our study found that CEO pay has the greatest significance in terms of CEO characteristics and has a negative influence on crash risk.Higher pay encourages them to focus more on firm performance, resulting in lower crash risk.
Furthermore, we found that a firm with a CEO with accounting experience is more likely to crash.We think that a CEO with accounting experience can easily manipulate company performance, resulting in a higher crash risk.Additionally, CEO finance and crash risk are positively associated, consistent with Jiang et al.'s [11] findings.However, our study's results on CEO age's impact on crash risk contradict previous research.According to other researchers, a firm with a younger CEO is more likely to experience a crash risk [10].However, our results show a positive relationship between CEO age and crash risk.
This study has some practical implications for investors and supervisors.First, there is a strong association between firm characteristics and crash risk.Our study found that RET, IPO age, firm size, and brand capital affect crash risk.To avoid loss, investors should focus on these factors.For example, they can invest in stocks with a higher and more stable average weekly return, a higher ROE, and a lower ROA.RET, as the most important determinant in current research, should be noted that it negatively influences crash risk.A low or declining RET for a company may indicate poor profitability or business challenges, leading investors to become pessimistic about its future performance and stock price, resulting in share sales.Consequently, a mass sell-off can put downward pressure on the stock price, potentially leading to a market disaster if left unchecked.During a crash, stock prices may plummet, market turnover may sharply decrease, investor confidence may suffer, and the stock market's operational mechanism may sustain significant damage.Therefore, a decreasing or downward-trending RET is an early indicator of an impending stock crash.Investors and regulators should closely monitor this indicator and respond quickly to potential risks.Simultaneously, companies should strive to improve performance and profitability to maintain share prices and bolster investor trust.
Our findings suggest that regulators should be vigilant in monitoring firms' characteristics, particularly those associated with higher crash risk.They can use the insights from this study to develop more targeted surveillance mechanisms and policies to prevent potential crashes.For example, they could impose stricter disclosure requirements on firms with specific characteristics, such as a high IPO age or a large size, to ensure that investors can access all relevant information when making investment decisions.

Conclusions
This study comprehensively identified the impact of various firm and CEO characteristics on crash risk and their respective contributions to stock price crash prediction.We applied 11 machine learning models to analyze data from Chinese listed firms between 2010 and 2020.The practical results show that XGBoost has the best machine learning performance and effectively examines the relationships between 31 input variables and the risk of a stock price crash.Furthermore, the SHAP method used in this study shows the importance of firm and CEO characteristics in interpreting the XGBoost model.We found among the characteristics of the firm and the CEO, the most ten important factors that impact the stock price crash risk are RET, sigma, IPO age, LogSize, DTURN, Brand capital, CEO pay, Leverage, Cash, and Goodwilk Although this article contributes to a comprehensive presentation of firm and CEO characteristics that may be associated with the risk of a stock price crash, there are still research limitations.First, our study only included 31 features in our models, which may explain the models' limited performance.The study may have overlooked important variables impacting model performance.Second, we do not differentiate between the equity sectors to which the stock belongs, which may affect the accuracy of the results.Future research can use advanced machine learning methods to analyze samples from various equity sectors.Moreover, we only considered stocks from companies listed in the Chinese stock market.Future research can use a longer period of observations and a broader range of stocks from different markets to explore crash risk more comprehensively.

Table 1 .
Distribution of the sample across industries.
Table 3 displays the MSE results of the five-fold cross-validation process (MSE kf1-MSE kf5).When measured with NCSKEW, XGBoost's MSE values ranged from 0.4654 to 0.5518, whereas when measured with DUVOL, they ranged from 0.2293 to 0.2657, which is lower than other methods.XGBoost performed best with a minimum mean MSE value, regardless of whether we used NCSKEW or DUVOL as the measure.

Table 3 .
Descriptive results of machine learning methods.
Note: The table compares the performance of 11 models using MSE aided by five-fold cross-validation.

Table 4 .
The results of the SHAP summary analysis.