Model-free time-aggregated predictions for econometric datasets

This article explores the existing normalizing and variance-stabilizing (NoVaS) method on predicting squared log-returns of financial data. First, we explore the robustness of the existing NoVaS method for long-term time-aggregated predictions. Then we develop a more parsimonious variant of the existing method. With systematic justification and extensive data analysis, our new method shows better performance than current NoVaS and standard GARCH(1,1) methods on both short- and long-term time-aggregated predictions.


Introduction
Accurate and robust volatility forecasting is a central focus in financial econometrics. This type of forecasting is crucial for practitioners and traders to make decisions in risk management, asset allocation, pricing of derivative instrument and strategic decisions of fiscal policies, etc. Standard methods to do volatility forecasting are typically built upon applying GARCH-type models to predict squared financial log-returns. With the Model-free Prediction Principle, first proposed by Politis (2003), a model-free volatility prediction method-NoVaS has been proposed recently for efficient forecasting without the assumption of normality. Some previous studies have shown that the NoVaS method possesses better pseudo-out of sample (POOS) forecasting performance than GARCH-type models on forecasting squared log-returns (Gulay and Emec, 2018;Chen and Politis, 2019).
However, to the best of our knowledge, such methods were not evaluated for time-aggregated prediction. Time-aggregated prediction here stands for the prediction of Y n+1 + · · · + Y n+h after observing {Y t } n t=1 . Such predictions remain crucial for strategic decisions implemented by commmodity or service providers, (Chudỳ et al. (2020); Karmakar et al. (2020)), trust funds, pension management, insurance companies, portfolio management of specific derivatives (Kitsul and Wright (2013)) and assets (Bansal et al. (2016)). In this paper we will focus on forecasting squared log-returns of different econometric datasets. A time-aggregated forecasting is able to provide somewhat confidence at understanding the general trend in near future, maybe the entire next week or months ahead and definitely stands more meaningful than just understanding what might happen for any single-step ahead (Predicting Y n+h for one value of h) in time horizon. In fact, quality of forecasts for econometric data has been evaluated through such time-aggregated metrics in (Starica, 2003;Fryzlewicz et al., 2008). Apart from exploring the capabilities of existing NoVaS method towards time-aggregated forecasting, we also attempt to improve the existing one further by proposing a more parsimonious model. We substantiate our proposal by extensive simulations and data analysis.

The existing NoVaS method
NoVaS method is a Model-free prediction principle. The main idea lies in applying an invertible transformation H which can map the non- components. This leads to the prediction of Y t+1 by inversely transforming the prediction of ǫ t+1 (Politis, 2015). The starting point to build the transformation of the existing NoVaS method is the ARCH model (Engle, 1982). Then, Politis (2003) made some adjustments to determine the final form of H as: In Eq. (2.1), {Y t } n t=1 is the log-returns vector in this article; {W t } n t=p+1 is the transformed vector which we want to make it be i.i.d.; α is a fixed-scale invariant constant; For reaching a qualified transformation function, Eq. (2.2) is required to stabilize the variance.
Then, α and a 0 , · · · , a p are finally determined by minimizing |Kurtosis(W t )−3| 1 . This method is model-free in the sense, we don't assume any particular distribution for the innovation {W t } except for matching it's kurtosis to 3. Once H is found, H −1 can be gotten immediately. For example, H −1 corresponding with Eq. (2.1) is: To get the prediction of Y 2 n+1 , Politis (2015) defined two types of optimal predictors under L 1 (Mean Absolute Deviation) and L 2 (Mean Squared Error) criterions after observing historical information set F n = {Y t , 1 ≤ t ≤ n}: 3) and setting t as n+1. During the optimization process, different forms of unknown parameters in Eq. (2.2) are applied so that various NoVaS methods were established. Chen (2018) pointed out that the Generalized Exponential NoVaS (GE-NoVaS) method with exponentially decayed unknown parameters presented in Eq. (2.5) is superior than other NoVaS-type methods. (2.5)

A new method with less parameters
However, during our investigation, we found that the GE-NoVaS method returns extremely large predictions under L 2 criterion sometimes. A removing-a 0 idea is proposed to avoid such issue in this article. H and H −1 of the GE-NoVaS-without-a 0 method can be rewritten as below: We should notice that even without a 0 term, the causal prediction rule is still satisfied. It is easy to get the analytical form of the first-step ahead Y n+1 , which can be expressed as below: More specifically, when the first-step GE-NoVaS-without-a 0 prediction is performed, {W * n+1 } are generated M (i.e., 5000 in this article) times from a standard normal distribution by Monte Carlo method or bootstrapping from its empirical distributionF w 3 . Then, plugging these {W * n+1,m } M m=1 into Eq. (2.7), M pseudo predictions {Ŷ * n+1,m } M m=1 are obtained. According to the strategy implied by Eq. (2.4), we choose L 1 and L 2 risk optimal predictorsŶ 2 n+1 as the sample median and mean of {Ŷ * n+1,1 , · · · ,Ŷ * n+1,M }, respectively. We can even predict the general form of Y n+h , such as g(Y n+h ) by adopting the sample mean or median of {g(Ŷ * n+1,1 ), · · · , g(Ŷ * n+1,M )}. Similarly, the two-steps ahead Y n+2 can be expressed as: When the prediction of Y n+2 is required, M pairs of {W * n+1 , W * n+2 } are still generated by bootstrapping or Monte Carlo method from empirically or standard normal distributions, respectively. Y 2 n+1 is replaced by the predicted valueŶ 2 n+1 which is derived from running the first-step GE-NoVaS-without-a 0 prediction with simulated {W * n+1,m } M m=1 under L 1 or L 2 criterion. Subsequently, we choose L 1 and L 2 risk optimal predictors of Y n+2 as the sample median and mean of {Ŷ * n+2,1 , · · · ,Ŷ * n+2,M } Finally, iterating the process described above, we can accomplish multi-step ahead NoVaS predictions. Y n+h , h ≥ 3 can be expressed as: which are computed iteratively. L 1 and L 2 risk optimal predictors of Y n+h are computed by the sample median and mean of {Ŷ * n+h,1 , · · · ,Ŷ * n+h,M }. In short, we can summarize that Y n+h is determined by: Since F n is the observed information set, we can simplify the expression of Y n+h as: For applying the GE-NoVaS method, we can still build the relationship between Y n+h and {W n+1 , · · · , W n+h } as: We should notice that simulated {W * n+1,m , · · · , W * n+h,m } M m=1 for obtaining GE-NoVaS method prediction of Y n+h should be generated by bootstrapping or Monte Carlo method from empirically or a trimmed standard normal distribution 4 . Here, we summarize the Algorithm 1 to perform h-step ahead time-aggregated prediction using GE-NoVaS-without-a 0 method. The algorithm of GE-NoVaS can be written out similarly.

The potential instability of the GE-NoVaS method
Next, we provide an illustration to compare GE-NoVaS and GE-NoVaS-without-a 0 methods on predicting volatility of Microsoft Corporation (MSFT) daily closing price from January 8, 1998 to December 31,1999 and show an interesting finding that the long-term time-aggregated predictions of GE-NoVaS method is unstable under L 2 criterion. Based on the finding of Awartani and Corradi (2005), squared log-returns can 3F w is calculated from Eq. (2.1), i.e., the empirical distribution of transformed series {Wt} n p+1 corresponding with {Yt} n t=1 . 4 The reason of using the trimmed distribution is |Wt| ≤ 1/ √ a 0 from Eq. (2.1).
Step 2 Derive the analytic form of Eq. (2.11) using α k , a 1 , · · · , a p from the first step.
Step 3 Generate {W * n+1 , · · · , W * n+h } M times from a standard normal distribution or the empirical distributionF w . Plug Step 4 Calculate the optimal predictor of g(Y n+h ) by taking the sample mean (under L 2 risk criterion) or sample median (under L 1 risk criterion) of the set {g(Ŷ * n+h,1 ), · · · , g(Ŷ * n+h,M )}.
be used as a proxy for volatility to render a correct ranking of different GARCH models in terms of a quadratic loss function. Log-returns series {Y t } can be computed by the equation shown below: where, {X t } is the corresponding MSFT daily closing price series. For achieving a comprehensive comparison, we use 250 financial log-returns as a sliding-window to do POOS 1-step, 5-steps and 30-steps (long-term) ahead time-aggregated predictions under L 2 criterion. Then, we roll this window through whole dataset, i.e., we use {Y 1 , · · · , Y 250 } to predict Y 2 251 , {Y 2 251 , · · · , Y 2 255 } and {Y 2 251 , · · · , Y 2 280 }; then use {Y 2 , · · · , Y 251 } to predict Y 2 252 , {Y 2 252 , · · · , Y 2 256 } and {Y 2 252 , · · · , Y 2 281 }, for 1-step, 5-steps and 30-steps aggregated predictions respectively, and so on. We can define all 1-step, 5-steps and 30-steps ahead time-aggregated predictions as {Ŷ 2 k,1 }, {Ŷ 2 i,5 } and {Ŷ 2 j,30 } which are presented as below: Assume there are total N log-returns data points: (2.14) In Eq. (2.14),Ŷ 2 k+1 ,Ŷ 2 i+m ,Ŷ 2 j+m are single-step predictions of squared log-returns by two NoVaS-type methods. To get "Prediction Errors" for two methods, we can calculate the "Loss" by comparing aggregated prediction results with realized aggregated values based on the formula Eq. (2.15): where, {Y 2 p+m } are realized squared log-returns. To show the potential instability of the GE-NoVaS method under L 2 criterion, we take α to be 0.5 to build a toy example. In the algorithm of performing the GE-NoVaS method, α could take an optimal value from a discrete set {0.1, · · · , 0.8} based on prediction performance. From Fig. 1, we can clearly find that the GE-NoVaS-without-a 0 method can better capture true features. On the other hand, the GE-NoVaS method returns unstable results for 30-steps ahead time-aggregated predictions.

Data analysis and results
To perform extensive data analysis in a bid to validate our method, we deploy POOS predictions using two NoVaS and standard GARCH(1,1) methods with simulated and real-world data. All results are collated in Table 1. For controlling the dependence of prediction performances on the length of the dataset, we build datasets with two fixed lengths-250 or 500-to mimic 1-year or 2-years data, respectively. At the same time, we choose the window-size for our rollover forecasting analysis to be 100 or 250 for 1-year or 2-years datasets.

Simulation study
We use same simulation Models 1-4 from Chen and Politis (2019) shown as below to mimic four 1-year datasets. Recall that one NoVaS method can generate L 1 or L 2 predictor and {W * } can be chosen from a normal distribution or empirical distribution, thus there are four variants of one specific NoVaS method. We take the best-performing result among four variants of a specific NoVaS method to be its final prediction. Finally, we keep applying the formula Eq. (2.15) to measure the performance of different methods like we did in Section 2.3.

A few real datasets
We also present a variety of real-world datasets of different size and intrinsic behavior • 2 year period data: 2018∼2019 Stock price data.
• 1 year period data: 2019 Stock price and Index data.
• 1 year period volatile data due to pandemic: 11.2019∼10.2020 Stock price, Currency and Index data.
Taking into account three types of real-world data is made to challenge our new method and explore the existing method in different regimes. We also tactically pay more attention to short and volatile data since it is a harder task to handle. Eq. (2.13) is continually used to get log-returns series of different datasets.
Result analysis: From last three blocks of Table 1, there is no optimal result which comes from the GARCH(1,1) method. When target data is short and volatile, GARCH(1,1) gives terrible results for 30-steps ahead time-aggregated predictions, such as volatile Djones, CADJPY and IBM cases. Within two NoVaS methods, GE-NoVaS-without-a 0 method outperforms GE-NoVaS method for three types of real-world data. More specifically, around 70% and 30% improvements are created by our new method compared to the existing GE-NoVaS method on forecasting 30-steps ahead time-aggregated volatile Djones and CADJPY data, respectively. We should also notice that the GE-NoVaS method is again beaten by GARCH(1,1) model on 30-steps ahead aggregated predictions of 2018∼2019 BAC data. On the other hand, GE-NoVaS-without-a 0 method stands stably. See Appendix A for more results.

Statistical significance
However, one may think the victory of our new methods is just specific to these samples. Therefore, we challenge this superiority by testing statistical significance. Noticing the GE-NoVaS-without-a 0 method is the nested method (taking a 0 = 0 in the larger model) compared with the GE-NoVaS method, we deploy the CW-test (Clark and West, 2007) to make sure that the removing-a 0 idea is also statistically reasonable, see the P -value column in the Table 1 for tests' results 5 . These CW-tests results imply that the null-hypothesis should not be rejected for almost all cases under 5% level of significance, which advocates equivalence of new method to the existing one.

Summary
We summarize our findings as follow: • Existing GE-NoVaS and new GE-NoVaS-without-a 0 methods provide substantial improvement for time-aggregated prediction which hints to stability of NoVaS-type methods for providing long-horizon inferences.

GE-NoVaS
GE-NoVaS-without-a 0 GARCH(1,1) P-value(CW-test) The values presented in GE-NoVaS and GE-NoVaS-without-a 0 columns are relative performance compared with 'standard' GARCH(1,1) method. The null hypothesis of the CW-test is that parsimonious and larger models have equal mean squared prediction error (MSPE). The alternative is that the larger model has a smaller MSPE. • Our new method has a superior performance than the GE-NoVaS method, especially for shorter sample size or more volatile data. This is significant given GARCH-type models are difficult to estimate in shorter samples.
• We provide a statistical hypothesis test that votes for our model advocating a more parsimonious fit, especially for long-term time-aggregated predictions.

Discussion
In this article, we explored the GE-NoVaS method toward short and long time-aggregated predictions and proposed a new variant that is based on a parsimonious model, has a better empirical performance and yet is statistically reasonable. We hope these empirical findings open up avenues where one can explore other specific transformation structures to improve existing forecasting frameworks.  Table A.4. It is clear that both NoVaS-type methods still outperform the GARCH(1,1) model for short-and long-term time-aggregated forecasting. Although the GE-NoVaS method attaches optimal performance in some cases, we should notice that the GE-NoVaS-without-a 0 method still gives almost same but slightly worse results. Interestingly, the GE-NoVaS-without-a 0 method can introduce significant improvement compared with the GE-NoVaS method for 30-steps ahead predictions. This again hints to more robustness of our new method specifically for long-term aggregated predictions.