3.1. Data Description and Processing
In this paper, 5 min high-frequency SSEC Index data (000001.SH) from 1 February 2012 to 28 June 2024 were selected as the research sample, and the relevant data were all sourced from the Wind database. There was a total of 48 sample points for each trading day in the SSEC dataset with a 5 min sampling frequency (excluding the opening price); we used realized volatility to measure stock market volatility, which effectively reduced the impact of noise and error on volatility estimates compared to squared daily returns, in the following form:
where
denotes the logarithmic return at the
j-th moment of the
t-th trading day.
Figure 4 shows the daily realized volatility of the SSEC Index. As the figure illustrates, the RV displays a time-varying and aggregated character, particularly during the 2015–2016 stock crash, when the index exhibited very high volatility.
Four predictors were included in the analytical framework:
(1) Macroeconomic Indicators: To develop a comprehensive set of macroeconomic fundamental indicators, this study, guided by the research of Cakmakli and Dijk [
35], employed monthly data for 32 macroeconomic variables spanning five key domains, including economic conditions and finance. The data cover the sample period from February 2012 to June 2024. These variables were first subjected to LASSO (Least Absolute Shrinkage and Selection Operator) regression for feature selection, after which factor analysis was performed to construct the macroeconomic fundamental factors. The data were drawn from the “CSMAR” and are shown in
Table 1.
This study initially considered a comprehensive set of macroeconomic variables identified from the established literature as pertinent to stock market volatility. This presented a significant challenge to the subsequent volatility forecasting model. To refine this initial selection, the LASSO (Least Absolute Shrinkage and Selection Operator) method was first employed for variable screening. Given the extensive nature of the initially selected variable pool, a considerable number of variables typically remained even after this LASSO procedure. Therefore, factor analysis was subsequently applied to this reduced set of variables. This second step served to further mitigate dimensionality and to construct the final, more parsimonious macroeconomic fundamental indicators utilized in our model.
Each of the selected raw macroeconomic variables comprised 149 monthly observations. For the purpose of dimensionality reduction using LASSO regression, this dataset was partitioned into training and test sets based on a 7:3 ratio. Consequently, the first 105 observations constituted the training set, while the remaining 44 observations formed the test set. The LASSO regression technique was subsequently employed for an initial screening of this comprehensive set of macroeconomic indicators. The optimal regularization parameter, λ, for the LASSO model was determined via a 10-fold cross-validation procedure applied to the training data. This process identified an optimal λ value of 0.0276. Utilizing this λ, the LASSO model selected six macroeconomic variables as most relevant. These variables, detailed in
Table 2, encompassed key economic domains, such as consumption, finance, and trade. Collectively, these selected indicators provided a multifaceted representation of China’s overall macroeconomic conditions, thereby offering a parsimonious yet informative basis for the subsequent factor analysis intended to construct the final macroeconomic fundamental indicators.
In order to further reduce the number of macroeconomic variables, composite macroeconomic indicators were constructed, and their applicability was improved in subsequent volatility forecasting; factor analysis was conducted on the six macroeconomic variables selected through LASSO regression. This dimensionality reduction process yielded two principal macroeconomic factors,
Fm1 and
Fm2, which together explained over 70% of the total variance. The detailed factor loadings are presented in
Table 3.
It can be seen from
Table 3 that
Fm1 is primarily composed of the PPIRM and the AERusd. This factor can be interpreted as the “Cost and Foreign Exchange Factor”, capturing macroeconomic pressures stemming from production costs and external exchange rate risks.
Fm2 is mainly influenced by the CSI, RSGG, CFAI, and IOP. This factor reflects the expansion of domestic demand, as well as production and investment activity, and can thus be characterized as the “Economic Growth Factor”.
(2) SSEC Index Technical Indicators: Two types of technical indicators were selected in this paper. One consisted of the fundamental trading indicators, specifically including the opening price, the highest price, the lowest price, turnover, and five other characteristic indicators. The other included the main technical indicators, specifically the CCI, DMA, and MACD, and 17 other characteristics of the indicators; the data sample interval for the period was from 1 April 2014 to 28 June 2024, and the data came from the Wind database. The specific indicators are shown in
Table 4.
To reduce the complexity of the volatility forecasting model, this paper performed principal component analysis (PCA) to downscale the technical indicators. Based on the results of the PCA, the variance contribution rates of the first five principal components were 36.37%, 19.96%, 12.47%, 8.44%, and 6.61%, respectively, with a cumulative variance contribution rate of 83.85%. As this cumulative value exceeded 80% and the eigenvalues of these five components were all greater than 1, it was indicated that most of the original information could be effectively retained by selecting these components. Therefore, the first five principal components were retained and utilized as technical indicators of stock market trading in the subsequent volatility modeling and forecasting.
As shown in
Table 5, the principal component loading matrix yields five stock market trading technical indicators.
Ft1 represents the overall market trend and price levels, with high loadings on variables such as price indicators (HIGH, LOW, and OPEN), moving averages (MA and EXPMA), trend indicators (BBI, DMA, and MACD), and several oscillator indicators. These components are considered the most significant for capturing the general direction of market movements.
Ft2 reflects momentum and short-term fluctuations, showing strong positive loadings on price change rates (CHANGE and PCT_CHANGE) and several oscillators (BIAS, RSI, SOBV, and CCI), while exhibiting negative loadings on trend and moving average indicators.
Ft3 primarily captures trading activity, with its loadings almost exclusively concentrated on volume and turnover-related measures.
Ft4 and
Ft5, although contributing less variance compared to the first three components, capture more subtle and complex relationships among indicators. These components may reflect specific market patterns or signal combinations derived from technical indicators.
(3) Economic Policy Uncertainty Indicator: In recent years, economic policy uncertainty in China has risen significantly, driven by various international factors, including the global financial crisis, interest rate hikes by the U.S. Federal Reserve, and ongoing trade tensions between China and the United States. As a macro-level risk factor, incorporating economic policy uncertainty into volatility forecasting models can enhance the models’ ability to capture shifts in market sentiment and risk appetite induced by changes in the policy environment. This, to some extent, helps address the limitations of models that rely solely on historical price information.
In this study, the Economic Policy Uncertainty (EPU) Index developed by Davis and his collaborators was employed as a proxy for China’s economic policy uncertainty [
36]. The index is constructed using natural language processing techniques, wherein a semantic screening algorithm identifies news articles related to economic policy fluctuations from
People’s Daily and
Guang Ming Daily. These are then standardized and compiled into a monthly time series. The sample period considered in this study spanned from February 2012 to June 2024. The relevant data were obtained from the China Economic Policy Uncertainty website (
http://www.policyuncertainty.com) (accessed on 14 August 2024).
(4) Jump Component: The jump component of realized volatility refers to the discontinuous part of total volatility caused by sudden price changes, such as market shocks or extreme events. It exhibits temporal dependence and is particularly effective in capturing abrupt price movements triggered by unexpected news or extreme market conditions, making it a valuable source of information for forecasting stock market volatility. This component is typically analyzed alongside the continuous component, which reflects volatility from regular trading activities. Together, they constitute the total realized volatility.
In this study, the jump component of the realized volatility for the Shanghai Composite Index was calculated using the bipower variation method. The sample period for the jump component spanned from 1 April 2014 to 28 June 2024. The detailed calculation method is presented as follows:
The realized continuous volatility can be defined as follows:
The modified Z test statistic of jump volatility is expressed as follows:
Given a significance level, α,
, a jump is identified when the test statistic
exceeds the critical value
, where
denotes the upper α quantile of the standard normal distribution. In this case, the indicator function for a jump takes the value 1; otherwise, it is set to 0, indicating no jump. The jump component is then constructed as follows:
Table 6 gives the descriptive statistics of the SSEC Index returns and other predictors. From
Table 6, it can be seen that the distribution of the SSEC Index return shows significant left-skewed and leptokurtic characteristics. The results of the Jarque–Bera statistic indicate that none of the variable data obeys the normal distribution. The ADF test statistics reject the null hypothesis of unit root presence at the 1% significance level, confirming stationarity across all variables. Thus, further econometric modeling analysis can be carried out directly.
3.2. Estimation of Multifactor GARCH-MIDAS Model
To incorporate macro low-frequency data in the forecasting of stock market volatility, full-sample multifactor GARCH-MIDAS model estimation was first carried out, in which the macro factors,
Fm1 and
Fm2, as well as monthly EPU, were used as the monthly low-frequency economic variables. The daily returns of the SSEC were used as the high-frequency variables to construct the multifactor GARCH-MIDAS model. The parameter estimates and corresponding statistics are reported in
Table 7. In the model estimation, the lag parameter
K = 24 was chosen for the low-frequency variables. The analysis assumed a 24-month lag for macroeconomic effects on market volatility. From the results of parameter estimation, all estimated parameters demonstrated statistical significance at the 1% level except for the mean parameters of short-term volatility
μ, which were not significant. The estimates of
α and
β were significantly non-zero at 0.0815 and 0.9010, respectively, and the sum of the two was close to 1, indicating that the GARCH model was smooth and well-fitted.
Regarding the long-term component, the model incorporates three low-frequency variables: the EPU Index and two macroeconomic factors (Fm1 and Fm2). The parameter θ represents the long-term influence of these variables on the volatility of the SSEC Index. Specifically, θ1, associated with the EPU Index, was estimated at 0.6050 and was significantly positive. This indicates that rising policy uncertainty significantly increases market volatility. In an environment of uncertain policy outlooks and frequent shifts in economic signals, investor risk aversion tends to increase, thereby amplifying volatility. This finding confirms that incorporating EPU as a long-term explanatory variable enhances the model’s ability to reflect market risks in the face of major policy changes and unexpected events. Parameters θ2 and θ3 capture the effects of the macroeconomic fundamentals Fm1 and Fm2, respectively. The estimate for θ2 was 0.5700 and was significantly positive, suggesting that increases in Fm1—comprising the PPIRM and AERusd—lead to heightened volatility. This reflects that rising production costs and currency depreciation increase uncertainty in corporate profitability and inflation pressure, thereby intensifying stock market fluctuations. In contrast, θ3 was estimated at –0.7150 and was significantly negative, implying that Fm2, interpreted as the economic growth factor, contributes to market stability. Stronger macroeconomic performance tends to reduce volatility in the SSEC Index. Finally, ω denotes the optimal weighting parameter. The estimation results show that ω exceeded 1 for all predictors, indicating that the influence of lagged observations on long-term volatility diminishes over time. In other words, more recent information exerts a stronger effect on volatility, which is consistent with the actual dynamics of economic behavior.
Based on the estimation results of the multifactor GARCH-MIDAS model, we can obtain the short-term component (gt) and the long-term component (τt), which are multiplicatively combined to obtain conditional volatility (ht).
3.3. Forecasting Results
Based on the previous analysis and indicator construction, seven input features were selected for volatility forecasting using the CNN-BiLSTM-Attention deep learning model. These include five technical indicator components (Ft1–Ft5) derived from principal component analysis, the conditional volatility component (ht) incorporating macroeconomic and economic policy uncertainty information, and the jump component (Jump). All features are obtained through dimensionality reduction and serve as comprehensive inputs for the model’s predictive framework. These features collectively form the predictive input matrix for volatility forecasting. To further validate the superiority of the proposed CNN-BiLSTM-Attention model in forecasting stock market volatility, this study adopted the HAR-RV model as a benchmark and compared it with traditional machine learning models (SVR, Random Forest, and XGBoost) as well as other deep learning models (LSTM, BiLSTM, CNN-LSTM, and CNN-BiLSTM). The forecasting performance of the CNN-BiLSTM-Attention model was comprehensively evaluated against these alternatives.
Given the ultifactor GARCH-MIDAS model’s lag parameter specification, the conditional volatility prediction timeline commenced on 1 April 2014. Correspondingly, the CNN-BiLSTM-Attention model’s input data spanned from 1 April 2014 to 28 June 2024. Following standard data partitioning protocols, we allocate 80% of observations to the training set and 20% to the test set. To mitigate overfitting, 20% of the training subset is reserved for validation, with parameter calibration guided by validation performance metrics. To mitigate the effects of dimensionality across features, feature scaling of the sample set is required.
To standardize the original data, we apply max–min normalization, scaling the values to a range between 0 and 1. This normalization helps prevent overfitting and enhances model accuracy. The min–max scaling process is implemented as follows:
where
is the normalized data,
X is the original data, and
Xmax and
Xmin are the maximum and minimum values.
In this paper, the main hyperparameters involved in the CNN-BiLSTM-Attention model were comprehensively considered in terms of both prediction accuracy and training time. The CNN layer was configured to extract local features using 64 filters and a convolution kernel of size 3 while enhancing the nonlinear expression ability through the ReLU activation function. The number of neuron nodes in the BiLSTM hidden layers was 32 and 16. This prevented overfitting by the BiLSTM structure, which captures forward and backward information in the time series. Additionally, a dropout layer rate of 0.2 was included to avoid overfitting. The attention layer improves the model’s focus on features at key time steps by learning the weighting coefficients of each time step of the input sequence. The model was trained using the Adam optimizer. The hyperparameters for the proposed CNN-BiLSTM-Attention model were optimized using a grid search strategy. The selected training configuration included a batch size of 32 and 60 epochs. At this point, the model demonstrated enhanced fitting and generalization capabilities while maintaining high stability, which is particularly important for volatility forecasting.
After selecting appropriate parameters, the seven-dimensional time-series dataset was transformed into a supervised learning dataset in input–output mode. The values of the past five days were used as input features to predict the volatility of the SSEC Index on the subsequent trading day. The resulting predictions are presented in
Figure 5.
In general, the model demonstrated superior performance in identifying turning points and undulating trends, with more accurate predictions that aligned more closely with the actual outcome. Overall, the model’s predictive efficacy was enhanced.
3.4. Out-of-Sample Forecasting Performance Evaluation
The out-of-sample predictive performance is more critical than the in-sample performance, as market participants prioritize a model’s ability to forecast future outcomes over its capacity to analyze historical data. In this section, we evaluate the out-of-sample forecasting performance of the nine models.
3.4.1. Out-of-Sample Forecasting Loss Function Values
The results of the forecasting loss function evaluation of different models on the test dataset are shown in
Table 8. The results presented in the table indicate that the predictive performance of the benchmark HAR-RV model is relatively limited. With a mean squared error (MSE) of 0.4367, it performs worse than all other models across the evaluated loss functions, as reflected in consistently higher error metrics. Traditional machine learning models such as SVR and XGBoost improve forecasting accuracy to a certain extent by leveraging nonlinear modeling techniques. Both models outperform the HAR-RV benchmark in terms of prediction error indicators.
The deep learning model LSTM achieves an MSE of 0.2947, which is slightly higher than that of Random Forest and XGBoost. This suggests that although LSTM effectively captures temporal dependencies, its prediction accuracy remains marginally inferior to some traditional machine learning approaches. In contrast, the BiLSTM model improves out-of-sample forecasting performance by processing input sequences in both forward and backward directions, thereby reducing prediction errors.
Further enhancements were observed in the CNN-LSTM and CNN-BiLSTM models, which integrate convolutional layers to extract local temporal features, leading to improved forecasting accuracy and lower MSE values. The proposed CNN-BiLSTM-Attention model further advances performance by dynamically assigning attention weights to key time steps. It achieved the lowest loss values, except for MAPE, among all the models, with MSE, MAE, RMSE, and MAPE recorded at 0.1913, 0.4373, 0.2405, and 56.7565%. These results clearly demonstrate the superior capability of the CNN-BiLSTM-Attention model in stock market volatility forecasting.
3.4.2. Out-of-Sample R2 Results
Table 9 displays the out-of-sample
R2 statistics for several forecasting models evaluated relative to the HAR-RV benchmark model. As shown in
Table 9, the out-of-sample
R2 values of all the models are positive, indicating that both traditional machine learning and deep learning models exhibit improved predictive performance relative to the benchmark HAR-RV model. Except for LSTM, all deep learning models outperform traditional machine learning models in terms of
R2oos. Notably, the CNN-BiLSTM-Attention model achieved the highest
R2oos value of 0.5619, representing a 56.19% improvement in predictive accuracy over the benchmark model. These results highlight the superior forecasting capability of the CNN-BiLSTM-Attention model within the entire set of evaluated models.
3.4.3. Robustness Test
- (1)
DoC test
As illustrated in
Table 10, the results of the DoC test indicate that all forecasting models rejected the null hypothesis of no directional predictability at the 1% significance level. This confirms that each model is effective in forecasting the direction of volatility movements. The benchmark model, HAR-RV, achieved a DoC ratio of 0.5538, indicating a directional prediction success rate of 55.38%. In contrast, the CNN-BiLSTM-Attention model recorded a DoC ratio of 0.6981, representing a success rate of 69.81%—the highest among all the models evaluated. This corresponds to an improvement of 14.43 percentage points over the benchmark model. Moreover, the CNN-BiLSTM-Attention model outperformed traditional machine learning models, including SVR, Random Forest, and XGBoost, in directional accuracy. These results demonstrate the CNN-BiLSTM-Attention model’s superior capability in capturing the directional dynamics of volatility, further affirming its robustness in predictive performance.
- (2)
MCS test
To evaluate the model’s forecasting performance more objectively, we performed an MCS test on the prediction results of various models. The MCS test-derived
p-values are presented in
Table 11. As illustrated in
Table 11, the SVR model exhibited the strong performance under the HMSE loss function, but its effectiveness declined across other loss metrics. In contrast, the CNN-BiLSTM-Attention model achieved a
p-value of 1 in the MCS test across multiple loss functions, including MSE, MAE, QLIKE, and HMAE, indicating statistically superior predictive performance. These results underscore the model’s robustness in forecasting the volatility of the SSEC Index relative to alternative approaches. This conclusion is further supported by consistent findings from the out-of-sample
R2 and DoC tests.
- (3)
Different forecasting windows
The selection of the forecasting window can affect prediction accuracy [
37]. Therefore, to evaluate the robustness of our proposed model, we conducted a sensitivity analysis by altering the training–test split. The training set size varied from the baseline 80% of the full sample to 70% and subsequently 60%, with the test set correspondingly constituting 30% and 40% of the full sample.
As demonstrated in
Table 12, the performance of the models remained consistent with previous findings when the length of the forecasting window was varied. The deep learning models—particularly the CNN-BiLSTM-Attention model—consistently exhibited superior predictive accuracy, highlighting their robustness across different forecasting horizons.
- (4)
Replacement sample data
To further assess the robustness and generalizability of the previous findings, which were based on the Shanghai Composite Index, this study replaced the target series with the CSI 300 Index. This allowed for a more comprehensive evaluation of the model’s forecasting performance in a different market context. The 5 min high-frequency data of the CSI 300 Index, along with its associated technical indicators, were sourced from the Wind database. The sample period remained consistent with that of the Shanghai Composite Index, spanning from 1 April 2014 to 28 June 2024. The data were divided into training and testing sets in an 8:2 ratio.
Table 13 presents the out-of-sample forecasting results based on the CSI 300 Index. The results indicate that the forecasting accuracy is comparable to that observed for the Shanghai Composite Index. Compared with the benchmark HAR-RV model, all alternative models exhibited positive
R2oos values, confirming that both traditional machine learning and deep learning models incorporating multiple predictors can enhance the volatility forecasting accuracy for the CSI 300 Index.
Specifically, the CNN-BiLSTM-Attention model achieved MSE, RMSE, MAE, and MAPE values of 0.1798, 0.4240, 0.2315, and 54.6521%, respectively, which remained the lowest among all competing models. These results confirm that the CNN-BiLSTM-Attention model, which integrates a range of predictive factors, continues to demonstrate superior forecasting performance in the context of the CSI 300 Index, thereby validating its robustness and effectiveness in modeling stock market volatility.