Research on the Prediction of A-Share “High Stock Dividend” Phenomenon—A Feature Adaptive Improved Multi-Layers Ensemble Model

Since the “high stock dividend” of A-share companies in China often leads to the short-term stock price increase, this phenomenon’s prediction has been widely concerned by academia and industry. In this study, a new multi-layer stacking ensemble algorithm is proposed. Unlike the classic stacking ensemble algorithm that focused on the differentiation of base models, this paper used the equal weight comprehensive feature evaluation method to select features before predicting the base model and used a genetic algorithm to match the optimal feature subset for each base model. After the base model’s output prediction, the LightGBM (LGB) model was added to the algorithm as a secondary information extraction layer. Finally, the algorithm inputs the extracted information into the Logistic Regression (LR) model to complete the prediction of the “high stock dividend” phenomenon. Using the A-share market data from 2010 to 2019 for simulation and evaluation, the proposed model improves the AUC (Area Under Curve) and F1 score by 0.173 and 0.303, respectively, compared to the baseline model. The prediction results shed light on event-driven investment strategies.


Introduction
The rapidly growing securities market in China drives the participation and curiosity of investors. Investors' reaction on information plays a central role in modern financial markets globally. As one of the key signals of a company's profitability and sustainability, Dividend policy becomes an essential trigger of stock price movements. Despite what the Modigliani-Miller theorem states [1], the effects of dividend policies are puzzling, especially in China.
China's A-share market is an important global investment market. As of 1 December 2020, the total market value of China's A-share listed companies in Shenzhen and Shanghai stocks reached 82.92 trillion yuan. In the global market, the total market value of A-share listed companies is second only to the United States, ranking second globally [2]. Unlike other listed companies' emphasis on cash dividends, there is a long-term phenomenon in China's A-share market, called a "high stock dividend". For every ten shares of holding, the company will transfer five shares or more to shareholders [3], and its publication time is often concentrated in the annual report period. On 4 April 2018, the Shanghai Stock Exchange and the Shenzhen Stock Exchange announced the guidelines for disclosing listed companies' high stock dividend information (Draft). This disclosure imposed strict restrictions on the reduction and sales of shares held by relevant shareholders, the company's net profit, and the company's earnings per share, accelerating the resulting downward trend in the number of high stock dividend distribution which began in 2016. Some research shows that the stock dividend announcements have a positive impact. In contrast, the cash dividend announcements negatively impact abnormal returns for Chinese companies [4], and the intensity of high stock dividends is positively correlated with the scale of significant shareholders' reduction [5]. Part insiders may consciously take advantage of investors' irrational preferences to achieve their rational self-interest motivations like stock sales in high stock dividends. High stock dividends have the feature of an instrument [6]. Therefore, the prediction of stock dividend distribution is meaningful for alleviating the information asymmetry in the investment market, assisting investors in making investment decisions, and providing a decision-making basis for the market supervision department. That explains why a more accurate prediction on this particular phenomenon is valuable.
This study proposes a feature adaptive improved multi-layers ensemble model to boost prediction accuracy of "high stock dividends". The following structure of this paper is as follows. In Section 2, the previous results related to this study are reviewed, and the main contributions of stacking ensemble algorithm improving are also introduced. Section 3 describes each part of the feature adaptive improved multi-layers ensemble model in detail, including the optimization of feature engineering, the adaptive matching of the base model and the feature subset, and the design of the second feature extraction layer. All model backtesting results with A-share historical data are analyzed in Section 4 to evaluate the model's predictive ability. Finally, the research is summarized, and the follow-up work is presented in Section 5.

Literature Review
The previous studies discussed the phenomenon of high stock dividends mainly included three aspects: motivation, excess return, and prediction methods.
In the existing literature, there are several different views on the phenomenon. Most of the work was based on the aspects of traditional economics and behavioral economics. Under the traditional economics framework, some scholars analyzed the high stock dividend phenomenon from the perspective of the signaling theory [7] and believed that the company's management intended to pass the information of the company's future performance to the investors through dividend policy. Many scholars demonstrated the effect of dividend policy on transmitting positive signals to the company's operation [8][9][10]. According to the optimal price theory put forward by other scholars, the excessive stock price will demand that small and medium-sized investors have more capital, which will restrict their trading behaviors. Split shares or dividend policies can reduce the stock price and improve liquidity, making the stock price be in a more reasonable range [11][12][13]. In behavioral finance, a class of views supported the dividend catering theory, pointing out that when investors had irrational preferences, the company's management had an incentive to cater to investors for proposing related dividend policies. Another group of researchers believed in price illusion theory, pointing out that nominal price changes due to stock dividend distribution can affect investors' decision-making [14].
Some scholars' research focused on the excess return causing by high stock dividend. Eng et al. (2014) empirically found that the strengthening of stock split supervision would reduce the information asymmetry [15], which made the return on the announcement day shift from high correlation with lagging profitability of the previous financial report to high correlation with future profitability. Furthermore, as Huang and Paul (2017) pointed out [16], institutional investors preferred companies that paid dividends. There was often an inevitable excess return in the A-share market before and after the occurrence of a high stock dividend phenomenon [17]. Therefore, the successful prediction of this phenomenon will help investors to build an effective event-driven strategy and obtain excess returns.
Another kind of research focused on the prediction of the stock dividends, but this kind of research was relatively unpopular. Ezell and Rubiales (1975) firstly used the idea of discrete dependent variable modeling to study the dividend policy prediction [18]. Bae (2010) introduced decision tree, multi-layer perceptron, and support vector machine (SVM) models [19]. Taking the data of Korean listed companies as an example, Bae found that the SVM model based on RBF kernel could accurately predict the dividend policy of South Korea. Xiong et al. (2012) used the logistic regression model to predict the high stock dividend phenomenon from 2007 to 2011 [20]. Multi-layers perceptron was proposed by Dong and Zhao (2019) to predict the phenomenon of the high stock dividend distribution, which improved the accuracy rate by 12% based on the logistic regression model [21].
According to the previous studies, there are still two aspects that should be improved: (1) Classical methods usually choose one method for feature selection. As we know, the selection principles of different single feature selection methods are different. As a result, the feature sets obtained by different methods are often different. In other words, some features can be selected by one method, but at the same time, they will be missed by another method. The single feature screening method has a specific feature omission risk.
(2) Single-method models have strong prediction ability but relatively low generalization ability. Therefore, some studies use the stacking algorithm to integrate the base model's output and improve the generalization ability. In this general stacking algorithm, the base model's output is usually weighted or used as the input of a classification model to predict the final result. However, such a method lacks information extraction for the output of the base model, which limits the use efficiency of the feature information of the model and restricts the model's predictive ability.
This paper proposes a feature adaptive improved multi-layers ensemble model, an improved stacking ensemble model. This study's main contributions are as follows: (1) We use the equal weight feature comprehensive evaluation method to select the effective features. This method can take advantage of various single feature selection methods and reduce the risk of missing essential features. (2) Genetic algorithm is used to customize the optimal feature subset for each base model to improve each base model's predictive ability, which is the basis for improving the overall predictive ability of the model. (3) With the inspiration of the deep tree model [22], this paper uses the GBDT (Gradient Boosting Decision Tree) [23] model as the feature information extraction layer of the base model output in the stacking algorithm [24]. The base model output is mapped to the new space to achieve new features and use the new feature to make predictions through feature information extraction. This work improves the prediction accuracy of the model.

The Design of Feature Adaptive Improved Multi-Layers Ensemble Model
After summarizing the relevant literature, this section will discuss the design of the feature adaptive improved multi-layers ensemble model. The modeling process is divided into three parts: feature engineering, construction and selection of feature adaptive base models, and the multi-layer ensemble model. This paper investigates the prediction of the "High Stock Dividend" phenomenon. Through identification of the A-listed companies with high stock dividend in the next six months as "1", otherwise as "0", the prediction observation can be transformed into a binary variable. Previous studies showed machine learning is effective when solving this kind of question, such as the rise and fall of stocks, debt default, etc. [25,26]. Feature selection plays an important role in machine learning prediction, and appropriate features can greatly improve the prediction ability of machine learning methods [27]. There are two common feature selection methods [28]: univariate methods (such as F value [29], maximum information coefficient (MIC) value [30], information value (IV) value [31], etc.) and multivariate methods (such as recurrent feature elimination (RFE) [32], etc.). All feature selection methods are based on a specific correlation or importance measurement method, but the relationship between variables is usually complex. Different feature selection methods may get different subsets [33]. Some variables may be tail features in one method and head features in another, which means that univariate methods exist the risk of missing important features. For this reason, the ensemble feature selection method will be used in this model.
Since the single-method model is weak with generalization ability [34], we decide to use the stacking ensemble model in this study. Stacking framework has been used in machine learning applications in different fields [35,36]. The idea of the framework is mainly divided into two parts. The first part integrates the first several layers of the model to achieve the generalization ability of the model as much as possible, and the second part integrates all the information and improving the robustness of the last layer. Due to the fact that the principles of different sub-models are different, their requirements for features may also be different to a certain degree. However, previous studies usually train and integrate different sub-models with the same selected feature dataset [37], which makes some sub-models lack the input of important features in training, and it is difficult for sub-models to achieve optimal performance and ultimately affect the prediction ability of stacking method. To improve the performance of the stacking framework, based on feature selection, this paper will use a genetic algorithm [38] to find the optimal feature subset of the corresponding model and train the model independently to achieve the consistency between the base model and the feature subset. The output of each base model can be regarded as a newly generated feature. To improve the efficiency of information utilization, we then need to cross these output features and extract new features. As an essential branch of machine learning, the tree model originated from the ID3 algorithm in 1986. After decades of development, tree models with good performance, such as CART (Classification and Regression Tree), C5.0, and others, have been proposed, making the tree model very popular. The tree model's basic metrics include Gini impurity, Information gain, etc., which are based on the concept of entropy and information theory, which makes the tree model less demanding on the amount of data compared to other models. Because of this advantage, the tree model is well suited to act as the second feature extraction layer in "high stock dividend" prediction. The GBDT model is selected to generate new features from the base model's output based on the feature cross-ability. Finally, Logistic Regression (LR) model is used to extract information from these features generated at the last level and outputs the final prediction results, which will improve the model's generation ability.
As shown in Figure 1, the first part of the model is feature engineering. In this part, the equal weight comprehensive feature evaluation method is used to find out the features related to high stock dividends, and the corresponding feature subset 1 is obtained. The feature subset 2 is then obtained by automatically expanding the feature subset 1 by the genetic programming method. The second part of the model is the construction and selection of the feature adaptive base model. In this paper, we use LR [39], SVM [40], Random Forest (RF) [41], LightGBM (LGB) [42], Multi-Layers Perceptron (MLP) [43], and K-Nearest Neighbor (KNN) [44] models with multiple datasets and feature subset combinations using feature adaptive selection algorism to form the feature adaptive base model. According to the base model comparison coefficient (formula 1) of the base models [45], the base model with better performance and differentiated output results in the verification set in 2018 is selected. The specific steps are: (1) Calculate the numerator, which is the AUC [46] (1) to obtain the base model comparison coefficient, which will be used as the metrics to select the base models.   The last part of the model is the construction of a multi-layer ensemble model. In this paper, a multi-layer stacking ensemble model is designed to further improve the prediction and generalization ability based on the base models. Each part will be described in detail below.

Feature Engineering
The goal of feature engineering is to screen features in all directions (from the perspective of a linear relationship, nonlinear relationship, and model performance) without loss of model accuracy (AUC). The specific steps are as follows: (1) For single-factor analysis, using one-way ANOVA (F value) to investigate the linear relationship between features and target variables, and using family-wise error rate (FWE) error measure methods to investigate whether the features suitable under this inspection. (2) The maximum information coefficient (MIC) is used to investigate the arbitrary statistical relationship between features and target variables. The MIC value was scored with a fixed proportion (more than 50% quantile). (3) Firstly, the genetic algorithm is used to divide the features into boxes. The information value (IV) is used to check whether the features suitable under this inspection (more than 50% quantile). (4) Recursive feature elimination (RFE) with cross-validation was used to investigate the linear model's importance and nonlinear model features by the LR model and the RF model with the L1 regular term. According to the output of RFE, whether the features scored under the inspection was evaluated (set to retain 50% features). After the above feature screening, each feature gets 5 groups of scores (1 point for each group). Finally, the final score of each feature is obtained by using the equal weight method. If the score is more than 4(including 4), it will be in the feature subset 1 with 48 features. Secondly, this paper uses genetic programming to mine features, which can automatically discover the potential relationship of features and get feature subset 2 with 100 features.
Considering the characteristics of the high stock dividend prediction problem, the model's core evaluation indicators are determined as AUC and F1 score [47]. On the one hand, the key index of the prediction is AUC, which comprehensively considers the positive and negative examples and reflects the degree of fitting of the model, which is suitable for the unbalanced two classification problem in this paper. However, the F1 score can comprehensively reflect the model's accuracy and recall rate and comprehensively reflect its prediction ability.

The Construction and Screening of the Feature Adaptive Base Models
Based on the particle swarm optimization (PSO) feature selection algorithm proposed by Dai and Li [48], this paper presents an adaptive feature selection algorithm. Considering that the RFE method only selects features from the perspective of feature importance, it does not take into account the promotion of feature subset on the model's prediction ability. In this paper, the AUC returned by each base model is used as the adaptive function to be optimized, and the feature selection model is designed by using a genetic algorithm. The specific algorithm flow is shown in Figure 2.
After the design of the adaptive feature selection algorithm, the model uses the instance hardness threshold to process the unbalanced data; LR, SVM, RF, LGB, MLP, and KNN are selected as the base models to be selected, and six sets of datasets are constructed with the sliding window method in Figure 3. Then, 72 combinations (6 × 2 × 6 = 72) of different models, feature subsets, and datasets are combined with the adaptive feature improvement method to find the corresponding optimal feature subset (AUC is calculated by verification set when the algorithm is applied). Finally, taking 2018 s data as the verification set, the AUC of each combination is obtained, and the formula (1) is constructed (the model with the highest AUC is taken as a reference base).

Multi-Layer Ensemble Model
Based on the construction idea of the comparison coefficient of the base model (in the first layer of Figure 1), this paper has screened out the basic model with strong expressive ability and a certain degree of difference. Because the corresponding dataset length and the feature subset are different, the model adopts a stacking framework to express each model's advantages. For the second layer design of Figure 1, traditional ensemble ideas often need the same length of datasets because of the different lengths. Simultaneously, some base models often fail to have the best "memory" ability, performance, and difference on the same dataset. Therefore, the model adopts the GBDT feature derivation framework and uses the LGB model to extract features further. The LGB model is used as the second layer of the stacking ensemble model, while the sample tree node information of LGB is extracted as the output of the second layer after the input of the predicted value of the base models. In the third layer of Figure 1, all samples' tree node information is used as the input of the LR model who has good robustness. The multi-layer stacking ensemble model integrates various datasets and feature subsets and uses the idea of deep learning to improve the prediction ability based on multiple strong learners. The memory ability of the machine learning model is explored as much as possible in the first layer. Then, the model's generalization ability is improved by the second layer, and the risk of model overfitting is reduced by using the third layer.

Results on the Test and Evaluation of High Stock Dividend Prediction Model
According to the modeling process in Figure 1, this section will test and evaluate the model (win10 + python3.7). The test and evaluation of the model include the following three parts: the feature screening evaluation of the equal weight feature comprehensive

Multi-Layer Ensemble Model
Based on the construction idea of the comparison coefficient of the base model (in the first layer of Figure 1), this paper has screened out the basic model with strong expressive ability and a certain degree of difference. Because the corresponding dataset length and the feature subset are different, the model adopts a stacking framework to express each model's advantages. For the second layer design of Figure 1, traditional ensemble ideas often need the same length of datasets because of the different lengths. Simultaneously, some base models often fail to have the best "memory" ability, performance, and difference on the same dataset. Therefore, the model adopts the GBDT feature derivation framework and uses the LGB model to extract features further. The LGB model is used as the second layer of the stacking ensemble model, while the sample tree node information of LGB is extracted as the output of the second layer after the input of the predicted value of the base models. In the third layer of Figure 1, all samples' tree node information is used as the input of the LR model who has good robustness. The multi-layer stacking ensemble model integrates various datasets and feature subsets and uses the idea of deep learning to improve the prediction ability based on multiple strong learners. The memory ability of the machine learning model is explored as much as possible in the first layer. Then, the model's generalization ability is improved by the second layer, and the risk of model overfitting is reduced by using the third layer.

Results on the Test and Evaluation of High Stock Dividend Prediction Model
According to the modeling process in Figure 1, this section will test and evaluate the model (win10 + python3.7). The test and evaluation of the model include the following three parts: the feature screening evaluation of the equal weight feature comprehensive

Multi-Layer Ensemble Model
Based on the construction idea of the comparison coefficient of the base model (in the first layer of Figure 1), this paper has screened out the basic model with strong expressive ability and a certain degree of difference. Because the corresponding dataset length and the feature subset are different, the model adopts a stacking framework to express each model's advantages. For the second layer design of Figure 1, traditional ensemble ideas often need the same length of datasets because of the different lengths. Simultaneously, some base models often fail to have the best "memory" ability, performance, and difference on the same dataset. Therefore, the model adopts the GBDT feature derivation framework and uses the LGB model to extract features further. The LGB model is used as the second layer of the stacking ensemble model, while the sample tree node information of LGB is extracted as the output of the second layer after the input of the predicted value of the base models. In the third layer of Figure 1, all samples' tree node information is used as the input of the LR model who has good robustness. The multi-layer stacking ensemble model integrates various datasets and feature subsets and uses the idea of deep learning to improve the prediction ability based on multiple strong learners. The memory ability of the machine learning model is explored as much as possible in the first layer. Then, the model's generalization ability is improved by the second layer, and the risk of model over-fitting is reduced by using the third layer.

Results on the Test and Evaluation of High Stock Dividend Prediction Model
According to the modeling process in Figure 1, this section will test and evaluate the model (win10 + python3.7). The test and evaluation of the model include the following three parts: the feature screening evaluation of the equal weight feature comprehensive evaluation method; the testing and evaluation of the multi-layer ensemble model under the adaptive feature selection method; and the overall prediction evaluation of the adaptive improved multi-layer ensemble model.

Data Preprocessing and Feature Engineering
The data in this paper are from RESSET financial database. The features are the thirdquarter financial report data of China's A-share companies from 2010 to 2019 and the price volume data corresponding to the first working day of November of the corresponding financial year. The prediction target is the corresponding high stock dividend rate (high stock dividend rate is greater or equal to 0.5) corresponding to 2011-2020 (published in 2010-2019 annual report). In this paper, 245 features are divided into 13 categories. For the obtained samples, the specific data preprocessing scheme of the model is shown in Table 1. The sample size after data preprocessing is 19,753, and the number of effective characteristics is 219, which is shown in Table 2.  After data preprocessing, the following will be the test and evaluation of feature engineering: Firstly, this paper investigates the linear relationship between each feature and whether the company will issue a high stock dividend or not by F value and finds the arbitrary statistical relationship by MIC. Secondly, because the original continuous data may have considerable noise, which may affect the model, this paper introduces the IV feature selection method after the optimal box division of the genetic algorithm to investigate the features' performance. This paper uses a genetic algorithm to divide all features into ten boxes with the maximum IV value as the adaptive function due to the subjectivity and lack of clear mathematical meaning in the traditional discretization. According to the features' IV value, the features are screened, and the top 50% of the feature are obtained. Finally, LR with the L1 regular term and RF models are used as the base models of the RFE method. If RFE selects it, the feature will get a score.
Through the above five steps, five groups of scores are obtained. The features with a score of four or five are selected as feature subset 1. The specific selection solution is shown in Table 3. It can be seen from Table 3 that most of the categories have features been selected, while profitability, operation capacity, income quality, DuPont analysis, and industry information are not selected. In addition, genetic programming is used to extend feature subset 1 to 100 as feature subset 2.

The Evaluation of High Stock Dividend Prediction Model
This paper will first deal with data imbalance based on the two feature subsets and six datasets obtained above. Then we will select the base model of adaptive improved stacking ensemble algorithm. Finally, we will build a three-layer stacking ensemble model while showing and comparing the results.
In this paper, the prediction of high stock dividends needs to deal with unbalanced data before modeling. The attempts made in this paper for unbalanced data are shown in Appendix A. Due to the limitation of computing power, this paper only uses all 219 features, 2010-2013 as the training set (2014 as the test set), and the LGB model as the learner to compare the relative performance of unbalanced data sampling methods (some sampling methods based on nearest neighbor algorithm have high time complexity, so this paper ignores them). In the comprehensive comparison, because the instance hardness threshold method has the highest AUC value, it is more suitable for this dataset. The following will show the screening results of the feature adaptive base model and a multi-layer ensemble model.

Screening Results of Feature Adaptive Base Models
As described in Figure 1, the method for constructing the base model of adaptive improved stacking algorithm has two feature subsets, six datasets, and six models, which gets a total of 72 base model combinations. Constructing different datasets and different feature subsets lets the base models learn different information about the dataset as much as possible. Table 4 below shows the number of original features under each combination, the number of features after adaptive feature filtering, and AUC changes under the combination of two feature subsets (see Appendix A for the complete results). a. AUC here is comparable only under the same dataset. Due to the strict policies of high stock dividend in early 2018, the AUC of model under datasets 4, 5, and 6 decreased significantly. b. "Before" means that the features of the input model have not been filtered by the adaptive feature selection method; "After" means that the features of the input model have been filtered by the adaptive feature selection method. The process can be found in Figure 2.
After testing and comparing each base model's original verification set, this paper uses the 2018 data as the verification set to calculate the AUC of the verification set. The model selects feature subset 1 in dataset six and takes the MLP model as a reference base, calculates the base model's comparison coefficient according to formula (1) above, and then obtains five base models of stacking layer 1. Most of the base models in this paper meet the two requirements of stacking base model selection: the diversity of the basic model principle and good prediction ability. Finally, this paper adjusts the hyperparameters using fivefold cross-validation on the selected base model's original training set. The results are shown in Table 5. As can be seen from Table 5, except for the performance of LGB under dataset six after parameter adjustment, other models have improved to some extent. a. "Before" means that AUC comes from a model without parameter optimization; "After" means that AUC comes from a model with parameter optimization. b. The validation dataset of all models here is the data of 2018. Using the same data set as the validation set can help us compare the AUC of the model under the same experimental environment (data), so that the comparison result is reliable (Dataset division can be seen in Figure 3). c. The hyper parameters are shown in Appendix A.

The Results of Multi-Layer Ensemble Model
This paper constructs the second and third layers of stacking based on the idea of stacking ensemble and GBDT feature derivation. Specifically, after getting five groups of new features output from the five base models, this paper inputs them into the LGB model and obtains the information of the location of the sample in the leaf node of the tree (because the LGB can accommodate missing values and does not affect the construction of the decision tree, though the datasets of the base models in this paper are different). The LR model with lower complexity is adopted after obtaining the high-dimensional sparse new features of the second layer. After that, the results of this paper on the test set (2019) are obtained (see Table 3 for the results of layer two and layer three tunings). Finally, as compared with the results in Table 6, the AUC and F1 scores of the adaptive improved threelayer stacking integration model are improved by 0.173 and 0.303 respectively compared to the baseline model, which is better than the results of all the base models. The result of backtesting and comparison shows that the model has good predictive capabilities. Compared with previous models, forecasting ability mainly comes from the following three aspects of model improvement. First of all, the equal weight comprehensive feature evaluation method is used to consider the differences in various feature selection methods and effectively avoid the omission of effective features. For example, some price volume variables are not selected under the F value method but are selected under the MIC value method. Secondly, the adaptive feature evaluation method customizes different feature subsets for different base models, which can avoid the mismatch between features and base models, improve the predictive capabilities of each base model, and output high-quality information sources for the final information integration. Finally, a multi-layer ensemble model is built for further automatic feature extraction and abstraction of the output from the base models. This work improves the generalization ability of the model. As the model's final output layer, the LR model integrates information and predicts the high stock dividend phenomenon.
Due to the lack of definite standards, it is difficult for us to determine an optimal model at the beginning stage of modeling. The idea of the ensemble model has become a popular machine learning solution. However, it is worth mentioning that in the process of model integration, we need to make each base model achieve the best prediction ability on its own. There are many aspects to the optimization of the base model. A significant one is to customize the corresponding feature subset for each base model so that different base models can effectively extract the information of each feature variable. The experimental results of this study also support the validity of this idea.
There is no doubt that any prediction on the stock market by nature is crucial and valuable. However, it is not an easy task. Variables to be considered but not limited to reflect a long list: microeconomic and macroeconomic factors, financial statements, market conditions, regulatory policies, and individuals' sentimental behavior. As the world is advancing, algorithm models using machine learning are applied to the prediction game. The improved prediction accuracy provides insights to investors and policymakers.
All the results in this study are based on historical data backtesting, and there are still differences from the real market environment. Simultaneously, due to the limitation of the existing data feature dimensions, the feature input of the model in this paper still needs to be continuously improved, which is also the focus of our future work.

Conclusions
In this paper, based on equal weight comprehensive feature evaluation, GBDT, and stacking framework, the high stock dividend phenomenon's existing prediction models are improved. A feature adaptive improved multi-layers ensemble model is proposed. This paper's main contributions are as follows: (1) For the prediction of the high stock dividend phenomenon, the multi-layer stacking ensemble model constructed in this paper can predict the high stock dividend phenomenon accurately. Compared with the baseline model, the AUC is improved by 0.173, and the F1 score is increased by 0.303. (2) A complete comprehensive feature evaluation method and a model-based feature adaptive selection algorithm are proposed that can effectively select the feature subset which is more suitable for the corresponding model. (3) This paper proposes a multi-layer stacking ensemble model design, which can integrate models of different length datasets and feature subsets. This paper's practical significance is as follows: (1) From the investment perspective, this paper provides better prediction results than the existing methods, helping institutional investors better construct event-driven investment strategies on the high stock dividend issue. (2) From the perspective of policy, the existing policies are based on the previous scholars' interpretation of the motivation behind the phenomenon of the high cash dividend. With the help of this paper's high accuracy prediction model, regulators can conduct qualification screening for companies that may have a high stock dividend policy in the following year from November every year.
Although the model proposed in this paper has good prediction ability, there are still some limitations in this research, which will be possible future research directions. Firstly, this study uses the equal weight comprehensive feature evaluation method to filter the features that predict the "high stock dividend" phenomenon. Selected features increase the model's information input consistency, but their interpretability is not provided. Feature interpretation under the stacking framework will be one of our future research works. Secondly, because the "high stock dividend" dataset is highly unbalanced in the securities market, the number of "high stock dividend" listed companies is far fewer than other listed companies. This current situation limits the predictive ability of the model. Some sampling methods have been used in this study's data processing and have played some role in the improvement of model training. However, the research of this problem still requires better data balancing processing methods. Studying the sample structure of unbalanced data is also our future research agenda. Data Availability Statement: Data was obtained from RESSET Database and is available for registered users from the URL: http://db.resset.com/ (accessed on 5 September 2020).

Conflicts of Interest:
The authors declare no conflict of interest.  Figure 3. b. FT 0 represents the number of remaining features before the feature set is filtered by the adaptive feature selection method; FT 1 represents the number of remaining features after the feature set is filtered by the adaptive feature selection method.