Volatility Forecasting Based on Cyclical Two-Component Model: Evidence from Chinese Futures Markets and Sector Stocks

: This article aims to study the schemes of forecasting the volatilities of Chinese futures markets and sector stocks. An improved method based on the cyclical two-component model (CTCM) introduced by Harris et al. in 2011 is provided. The performance of CTCM is compared with the benchmark model: Heterogeneous Autoregressive model of Realized Volatility type (HAR-RV type). The impact of open interest for futures market is included in HAR-RV type model. We employ 3 different evaluation rules to determine the most efﬁcient models when the results of different evaluation rules are inconsistent. The empirical results show that CTCM is more accurate than HAR-RV type in both estimation and forecasting. The results also show that the realized range-based tripower volatility (RTV) is the most efﬁcient estimator for both Chinese futures markets and sector stocks.


Introduction
The performance of financial markets is a keen signal for economic development worldwide [1]. From this respect, analyzing and forecasting the trend of financial market is crucial. In recent decades, volatility as the second moment structure of price is the most concerned by mathematicians and economists, due to its influence on assets pricing, risk management, portfolio construction and financial derivatives' pricing models such as Black-Scholes-Merton Model. Therefore, better measurement and model of volatility will contribute to more accurate pricing for academics and market investors, which is helpful for the different interested parties, for instance, managers, investors, and policy maker to adopt miscellaneous financial measures such as raising or reducing capital, investment allocation, etc.
Various factors impact on the effectiveness of volatility model [2]. One is the fitness of the proxy for unobserved volatility which is adopted in the model. Traditional volatility proxies based on the squared demeaned return are unbiased estimators of the latent integrated variance because the integrated volatility is, by construction, the expectation of the squared demeaned return. Nevertheless, proxies constructed by the squared return employ only a single measurement of the price each period and contain no information about the intra-period trajectory of the price, which causes inefficiency. An improvement in efficiency is using intraday data, such as 5-min high-frequency data. In this view, academics come up with several volatility estimators. Another important factor that influences the effectiveness of a volatility model is the specification of the process that governs volatility dynamics. Academics build various kinds of models concentrated on different characteristics, such as long memory [3], cycling jumps [4], autoregression [5], etc.
However, several volatility models failed to forecast immature markets, due to its sharp jumps and falls. Chinese market is still one of the emerging financial environments but plays a key role in global economics development. China's Gross Domestic Product has been the second largest since 2010. By the end of 2017, the total market value of the Shanghai and Shenzhen stock markets had reached more than 8 trillion U.S. dollars, ranking as the second largest in the world [6]. In March, 2018, Chinese crude oil futures came into the market, which increased the influence of Chinese futures markets. At the same time, an extreme market fluctuation appeared in 2015. The Shanghai Composite Index fell rapidly by 32% in the 17 trading days after June 12 with market capitalization down by a third [6]. Therefore, to obtain a model that works effectively in immature markets, it is of good values to investigate China's financial markets.
This article aims to forecast the volatilities of Chinese futures markets and sector stocks based on cycling model. As a comparison, Heterogeneous Autoregressive model of Realized Volatility (HAR-RV) type [5] originated by Corsi in 2009 and its improved models [7] introduced by Christensen K. et al. in 2012 are chosen as benchmark. To find a better volatility proxy, 3 realized ranged-based estimators are introduced. The empirical data sample is a set of 3 years 5-min high-frequency data downloaded from WIND database. The futures' sample are: silver, aluminum, copper traded in Shanghai Futures Exchange (SHFE); ironstone, coke, coking coal, soybean meal traded in Dalian Commodity Exchange (DCE); and rapeseed meal traded in Zhengzhou Commodity Exchange (CZCE). The sector stocks' data sample are energy index, raw materials index, medical hygiene index, and financial real estate index listed in Shanghai Stock Exchange.
Volatility problem initiated from random walk. In 1980s, it was generally accepted that the price of a common stock followed a random walk. The volatility thus became a key variable to calculate. Parkinson [8] provided a method to estimate the volatility called realized range (volatility) (RRV), which was shown that 2-5 times better than squared return, the traditional volatility, in 1980. This method used the scaled difference of intraday highest and lowest price of stock to measure the volatility. Recently, RRV attracted attention again. Some literature [9] showed that it was not only significantly more efficient than the squared return, but also more robust than realized volatility, another volatility estimator provided by Andersen and Bollerslev in 1998 [10] to market microstructure noise.
In 2012, Christensen and Podolskij [7] based on the multipower variance measures introduced a complete realized range-based multipower variance (RMV) approach. Rather than employing the variance, RMV calculates volatility estimator with absolute returns in ranges. RMV provides considerable efficient results even though the high-frequency data provide sparse information. RMV provides an asymptotic efficient estimation, which is also robust to jumps. Another advantage of RMV is outstanding in the presence of market microstructure noise. Evidence [7] shows that higher-order RMV will usually get better estimation behavior; however, the third order is efficient enough in practice. Therefore, this paper employs realized range-based bipower volatility (RBV) and realized range-based tripower volatility (RTV) to model Chinese data.
Besides the development of various volatility estimators, academics also paid attention to the volatility models themselves. Muller et al. [11] presented the presence of heterogeneity across investors in 1993. In 1997, Andersen and Bollerslev [12] claimed the volatility process also contained multiple components. Corsi [5] concentrated on the heterogeneity that originates from the difference in the trading frequency and introduced HAR-RV model in 2009. Corsi separated the traders into 3 main parts: daily traders, weekly traders, and monthly traders. HAR-RV model led to a simple AR-type model was considered to have different volatility components based on time horizons, which had a good estimation and forecasting performance. In 2012, Christensen and Podolskij [7] provided a realized range-based multipower variation approach, which eliminated the bias by constructing a hybrid range-based estimator. Replacing the weekly and monthly RV by RBV or RTV, the model called HAR-RRV-RMV model revealed the better efficiency than traditional HAR-RV model. Some academics suggested that volatility could be divided into two components, processes governing long-term or short-term dynamics [4]. In 1998, Engle and Lee [13] originated a component GARCH model, which separated volatility into a constant long-run trend component and a temporary short-run component, i.e., mean reverting towards the long-run trend. Empirical evidence [2,9] revealed that the two-factor model had better performance than a one-factor model. The forecasting horizon of two-factor model is up to 1 year [2]. Harris et al. [4] provided a cyclical two-component model to estimate and forecast volatility over both short and long horizons. The long-term trend volatility was estimated by a non-parametric filter, while the short-term component was modelled by a stationary AR process based on the long-term component. The results gave reliable estimation and forecasting, compared with one-factor and two-factor range-based EGARCH model and the range-based FIEGARCH model [2] introduced by Brandt and Jones in 2006.
Motivated by Tianlun [14] who tried to find a reasonable volatility forecasting method for Chinese individual stocks by comparing 6 volatility estimators applied in 3 models, this paper employs 3 estimators in 2 models to estimate and forecast volatility of Chinese futures market and sector stocks. The estimators contain HRV, HBV, and HTV. While the models cover HRV-RV type and 3 kinds of cyclical two-component models (CTCM).
The contributions of this paper are concluded in 3 main points. First, this paper is the first paper to comprehensively study the volatility models of Chinese futures markets and sector stocks. The volatility estimators and volatility models are popular and show good predictability for recent years. This paper also considers the impact of Open Interest, which is important but less studied.
Secondly, this paper finds a common model to estimate Chinese futures and sector stocks, no matter the distributions and fluctuations of the products. Improved CTCM introduced in this paper shows robust estimation results.
Finally, this article employs different evaluation rules and provides an idea to determine the most efficient model when the evaluation rules are not consistent.
The remainder of this paper is separated in 4 parts. Section 2 is the methodology; Section 3 is the empirical results; Section 4 is the discussion; and Section 5 concludes the study.

Data
This paper employs the data downloaded from WIND database including around 1000 day 5-min interval intraday data for the period from 14 December 2015 to 11 August 2020, totally 4.75-year data. The data contains date, time, opening price, closing price, highest price, lowest price and open interest (for futures only), which is used to calculate the volatility estimators and construct volatility models. The futures' trading time is from 9:00 a.m. to 11:30 a.m., 1:30 p.m. to 3:00 p.m. and 9:00 p.m. to the next day 2:30 a.m., while the sector stock's trading time is from 9:30 a.m. to 11:30 a.m. and 1:00 p.m. to 3:00 p.m. In particular, if the data in some time are missing, it will be replaced by the same data as the last time.
The efficiency of model depends on sample size, thus most of articles used 10-year high-frequency data to investigate the effectiveness of models, and set the forecast period as 1 year. Due to the limit of practice, the database provides 5-min data covering only latest 3 years in China. Our data start from 14 December 2015 and end on 11 August 2020, around 1000 days in total. During forecasting, this paper employs 124 days data as out of sample forecasting period and other days as estimation period to construct models. There are two reasons for such a choice. First, the futures contracts usually have half-a-year holding period. Secondly, most papers employed 9-year data to construct model and 1-year data to forecast due to the evidence that the volatility estimators have 1-year predictability, therefore the predictive proportion is 1 over 10. Hence, we choose predictive and training proportions as around 1 over 9.
Before constructing models, the summary statistics of volatility estimators for each product (Tables 1-3) are shown below, containing minimum, maximum, median, mean, variance, kurtosis, and skewness. These numbers describe the distribution and the fluctuation of estimators, which may be a cause of the estimation's or the model's poor performance.  Table 2. Summary statistics of RBV for each product.  Table 3. Summary statistics of RTV for each product.

Volatility Estimators
This part introduces equations of volatility estimators, containing RRV, RBV and RTV. Let H t denote the highest price at an intraday time t, L t denote the lowest price at the same time t, and T denote the amount of an intraday data. The Lambda function is defined as where Γ(·) is Gamma function, and ζ(·) is Zeta function. In particular, λ(2) = 4 log 2.
Then the RRV i representing the ith day volatility estimator is defined as where t = 1, 2, 3, · · · , T is the order of the intraday price.
The RBV i representing the ith day volatility estimator is defined as where t = 1, 2, 3, · · · , T is the order of the intraday price.
The RTV i representing the ith day volatility estimator is defined as where t = 1, 2, 3, · · · , T is the order of the intraday price.

Volatility Models
This part introduces the volatility models including HAR-RV type and CTCM type.

HAR-RV Type Models
The basic HAR-RV type model is HAR-RV. This model only considers the relation between estimator and its past moving average. Letσ t denote the volatility estimator,σ n,t denote the nth moving average at time t which is calculated bȳ σ n,t = (σ t +σ t−1 + · · · +σ t−n+1 )/n. (5) Noting that n = 5 represents weekly data and n = 22 represents monthly data. Then HAR-RV model is defined byσ where β is constant term, β d , β w , β m are the coefficients of daily term, weekly term and monthly term, respectively, and t is the residual at time t.
Besides the basic type, following [7], this paper also considers HAR-RRV-RBV and HAR-RRV-RTV model. To eliminate the bias, the HAR-RRV-RBV model replays RRV t by RRV t = λ(2)RRV t + (1 − λ(2))RBV t , and gets where RBV n,t is the nth moving average at time t.

and gets
where RTV n,t is the nth moving average at time t. Ripple and Moosa [15] argued that the futures markets differ from the stock markets in many respects and the OI provides additional information due to the complex relationship between open interest and trading volume. This paper adds the daily difference of logarithm of OI as a variable in HAR-RV type model for futures data. To research whether OI has influence on Chinese futures price, above models are chosen as benchmark. Then HAR-RV-OI models are defined by respectively, where Whether the basic HAR-RV model or the HAR-RV-OI model, the assumption is that there are linear relationships between the volatility estimator and its moving average or open interest. Besides the performance, this paper also considers the p-value of the models' coefficients. Since the HAR-RV type model employs moving average, it will trend to a horizontal line when forecasting.

Cyclical Two-Component Models
Let L t denote the long trend component of the square root of volatility estimator, and S t denote the short-run component. Then the square root of volatility estimator σ t is the sum of L t and S t by the definition. Constructing a CTCM type model follows 3 steps.
First, calculate the long component, L t . This paper uses low-pass filter of Hodrick and Prescott [16] to the price following [4]. Then we employ the filtered price in volatility estimator to calculate L t . Harris et al. [4] employed filter in price rather than volatility estimator, due to the fact that the intraday prices are more likely to satisfy the assumptions of the non-parametric filters and thus provides reasonable estimations of the underlying long-run trends in volatility. This process is implemented by a MATLAB built-in function, hpfilter().
Secondly, estimate an AR(1) model for S t = σ t − L t , i.e., where β is a non-zero number. It differs from the assumption of [4], which set the constant term as a zero-mean random error. This change is based on the following reasons. The figures show that the short component is near zero, but its mean is non-zero. Considering the magnitude of volatility estimator is small, the result near zero cannot be seen as zero. On the other hand, in an AR(1) model, the constant term is usually a non-zero number. If the constant term is zero, that means the S t will be zero when S t−1 is zero. On the other hand, the zero constant term means that today's short-run component is times or a percentage of the last day's short-run component. That does not satisfy the definition, which hints that the short-run component is a mean-reverting process. Finally, given theα estimated in the last step, the n-step ahead forecast of the square root of volatility estimator in CTCM is defined by where L F,t+n = L t for any n ≥ 1, for convenience. This follows [4], which claimed that the long-term component was a random walk. This paper considers CTCM daily forecasting (n = 1), weekly forecasting (n = 5), monthly forecasting (n = 22), and totally 3 kinds of CTCM models.
As a comparison, we construct a CTCM type model with random walk term e t .
where L t is the long-term component calculated by the same way as above, E[S t ] is the mean of short-term component, and e t is white noise with zero-mean and Var(e t ) = Var(S t ).

Estimation and Forecasting Methods
This paper employs in-sample estimation method to evaluate whether a model captures the characteristics of all data efficiently or not. Usually, effective estimation is essential to obtain quality forecasting results.
During forecasting, the data are divided into 2 groups. The first group is used to calculate the parameter of the model, and the second group is used to evaluate the performance of the model. We employ out of sample forecasting method with rolling windows.

Evaluation Rules
We employ Mincer-Zarnowitz regression test to evaluate the performance of estimation, and uses mean absolute percent error , root mean square error(RMSE), and Theil's U decomposition (Theil's U) to evaluate the forecasting results. In addition, we use modified Diebold-Mariano test (MDM-test) to diagnose whether the most efficient model has similar performance to other models.

Mincer-Zarnowitz Regression Test
Since CTCM is a non-linear regression model, goodness of fit R 2 is not a proper evaluation rule. Mincer-Zarnowitz regression test, which constructs a linear relationship between the estimation results and the original values, is a good replacement here.
Letσ t denote the t-th day's volatility estimator calculated by the intraday data, and σ E t denote the estimation result for t-th day. Then the Mincer-Zarnowitz regression is defined aŝ where α 1 and α 2 are the coefficients, and t is the white noise with zero-mean. To evaluate the performance of model estimation, the value of goodness-of-fit R 2 of Mincer-Zarnowitz regression is used. The model with the highest MZ-R 2 is the best estimation for the products.

Root Mean Square Error
The RMSE is a method to describe the difference between forecasting value and real data. If a model has least value of RMSE, we can conclude that its value is nearest to the actual value. Then we can claim it is the most efficient model.
Letσ t denote the volatility estimator calculated by the data and σ F t denotes the forecasting result. Then the RMSE is defined as

Mean Absolute Percent Error
When the volatility estimators are all small, a smallest RMSE cannot be evidence that a model has the most accurate forecasting results. Then other evaluations can be used in practice. MAPE is a method to describes the percentage difference between the forecasting results and the original value. Suppose the tolerance error is α%, which means only the model with the MAPE lower than α% can be used in practice, and the forecasting results for this model is lower or higher α% than the actual value, averagely.
Letσ t denote the volatility estimator calculated by the data, and σ F t denote the forecasting result. Then the MAPE is defined as This paper follows that the model with the MAPE lower than 35% is feasible in practice, and the model with the lowest MAPE has the best forecast for those products.

Theil's U Decomposition
Theil's U statistic is a relatively accurate measure that compare the forecasted results with the results of forecasting with minimal historical data.
Letσ t denote the volatility estimator calculated by the data, σ F t denote the forecasting result. Then the MAPE is defined as If the U value is less than 1, the forecasting method is better than guessing. where and h is the order of out of sample forecasting.
In our experiments, h is 1. If the p-value is less than −1.96 or greater than 1.96, we reject the hypothesis and conclude that the model has different forecasting performance.

Results
This paper aims to use cyclical two-component model (CTCM) with an appropriate volatility estimator to describe and forecast the Chinese futures markets as well as sector stocks. 3 kinds of CTCM type models are provided and tested. There are 21 CTCM type models (see Table 4 The experiment has 3 parts. The first is using all the data to test the fitness of models. The results are represented by MZ-R 2 , whose range is from 0 to 1. The higher value means the better estimation. The second is out of sample forecasting to check the forecasting fitness of models. The results are represented by MAPE, RMSE, and Theil's U statistics which have lower values mean they have better performance. The last is the summary of experiment results.

Estimation Results
The numerical results of estimation are listed in the following table (Table 4). The results are between 0 and 1, and the model with highest value is the most efficient one for the product. Besides the product I, most products are best estimated by the improved daily cyclical two-component model with realized range-based tripower volatility (CTCM-RTV-D).
For space saving, figures of the best estimation for some products are shown below (Figures 1-3). The blue line is the actual value, while the orange one is the estimated value. We can find that all the estimation lines are near the actual line.

Out of Sample Forecasting Results
The non-negative results of out of sample forecast are listed in the following tables ( Table 5)     By the evaluation rule of this paper, the models can be employed in practice if and only if the MAPEs are less than 35%. Thus, all the most efficient models above satisfy the rule, except for Ag(SHFE).
For space saving, figures of the best forecast for some products are shown below (Figures 4-6). The blue line is the actual value, while the orange one is the forecasted value.   The second table is based on the RMSE evaluation rule ( Table 6) For space saving, figures of the best forecast for some products are shown below (Figures 7-9). The blue line is the actual value, while the orange one is the forecasted value.      The third table (Table 7) is based on the Theil's U evaluation rule statistics. The model with the lowest value is the most efficient one for the product. Based on Theil's U statistics, Ag, Cu, Fe, J, and JM are best forecasted by HAR-RV type model. Al, and M are best forecasted by the improved daily cyclical two-component model with realized range-based tripower volatility (CTCM-RTV-D). However, ENG, MTR, MDC and FINRE are best forecasted by the Harris' daily cyclical two-component model with realized range-based tripower volatility (TWO-RTV-D).
For space saving, figures of the best forecast for some products are shown below (Figures 10-12). The blue line is the actual value, while the orange one is the forecast value.

Summary of Results
The following table (Table 8) just lists the name of the models which are most efficient.

Discussion
The empirical results show that the daily improved CTCM with RTV is the best model to describe the trend for most products, expect for I(DCE).
Based on MAPE and RMSE evaluation rules, all CTCM type models have better forecasting results than the HAR-RV type models in out of sample forecast, and RTV is the best estimator for all products in out of sample forecast.
According to Theil's U statistics, HAR-RV type models do well in Ag(SHFE), Cu(SHFE), I(DCE), J(DCE), and JM(DCE), and all of 3 volatility estimator RRV, RBV and RTV can be the most efficient estimators.
However, there is no efficient model which consistently satisfies all the 3 evaluation rules. We use modified Diebold-Mariano test to check whether the above most efficient model forecasting performances are similar or not. Experiment results show that most products' CTCM-RTV-D models have similar performance to TWO-RTV-D models, except for Al(SHFE). All the most efficient models of Ag(SHFE), Al(SHFE), Cu(SHFE), J(DCE) and JM(DCE) have different forecasting performance. All the most efficient models of RM(DCE) have the similar forecasting performance.
One reason the Theil's U statistics results show the HAR-RV type model more efficient than the CTCM type model is that the linear model will get a smaller Theil's U value easily. In this article, HAR-RV type model is linear model, while CTCM type mode is non-linear. Therefore, even if both of MAPE and RMSE of CTCM type model are smaller, HAR-RV type model may be more efficient according to Theil's U statistics. We do not suggest employing Theil's U statistics when comparing a linear model with a non-linear model. Therefore, the improved CTCM introduced in this paper does not have better forecasting than CTCM provided by Harris [4]. Moreover, the open interest has a little influence on the performance of HAR-RV type models in Chinese futures market, so that the HAR-RV-OI type models introduced in this paper could be an alternative term to improve HAR-RV type models.

Conclusions
This paper introduces an improved cyclical to component model (CTCM) to describes and forecast Chinese futures markets and sector stocks. The main idea is that the volatility is impacted by a long trend component and a short-run mean-reverting component. The experiment shows that the improved CTCM is efficient enough to estimate the trend of products in Chinese futures markets and sector stocks, and is also more efficient than HAR-RV type models.
Moreover, this paper show that an appropriate volatility estimator improves the performance of models. The RTV employed in CTCM type models are better than RRV introduced by [4]. The empirical results also show that the open interest in Chinese futures markets has little influence on the trend of volatility.