Garch Model Test Using High-Frequency Data

: This work is devoted to the study of the parameter test for the Generalized Autoregressive Conditional Heteroskedasticity (GARCH) model. Based on the daily GARCH model, using the parameter estimator obtained by intraday high-frequency data, the adjusted Likelihood Ratio test statistic and Wald test statistic are provided. Asymptotic distributions of the two adjusted test statistics are deducted and a way to select the optimal sampling frequency is also discussed. Simulation studies show that the proposed test statistics have better size and power than traditional ones (without using intraday high-frequency data). An empirical study is given to illustrate the potential applications of the proposed tests. The results show the idea of this article is of certain superiority and it can be extended to other GARCH type models.


Introduction
Volatility modeling is an important tool for policymaking, asset pricing, investment analysis, and risk management [1]. Since the seminal work by Engle [2] and Bollerslev [3], the Generalized Autoregressive Conditional Heteroskedasticity (GARCH) model has become a popular tool to model volatility in economic and financial time series. During the past decades, the model has been developed into many extensions [4][5][6][7][8][9][10][11][12][13]. From these literature, we can find that most studies on the GARCH model adopt the standard form based on daily close-to-close returns. Let r n be the log-return of day n, the standard GARCH(1,1) model [3] can be formulated as: where the sequence of innovations {Z n } is an i.i.d. with zero mean and unit variance, and the scale factors σ n are the conditional variance of r n for given information at time n − 1, κ, α, and β are the model parameters with assumption of κ > 0, α ≥ 0, β ≥ 0, and α + β < 1. A common approach to estimate the above model parameters is the Quasi Maximum Likelihood Estimate (QMLE), see [14] for a comprehensive review. Many empirical studies have proven that then GARCH model is particularly suitable for the modeling of financial time series because it can describe well the characteristics of financial time series, namely the time variability of variance and the ability to deal with thick tails, refer to [14] and references therein.
In recent years, high-frequency data of the financial market has become more accessible than before. Intraday high-frequency data, as a kind of financial big data, records the trading data with the sampling frequency of hours, minutes, or seconds within a day and is sampled at equal time intervals. Such data possesses a particular characteristic of high degree irregularity [1]. Research on high-frequency financial data can be dated back to, at least, the 1980s and the mainstream is about the realized volatility or intraday volatility. For example, Wood et al. [15] and Harris [16] found the existence of a distinct U-shaped pattern for the intraday return volatility curve. For other studies, one can refer to [17][18][19][20][21][22][23].
Intraday high-frequency data contain more information and it is straightforward to introduce these data to study GARCH-type models. Unfortunately, Andersen and Bollerslev's studies show that the GARCH model is inappropriate to directly fit the intraday high-frequency data due to its nonstationarity [24,25]. Namely, although one can estimate the parameters of a GARCH process using a 5 min time unit intraday high-frequency returns while the modeling does not make any sense. Consequently, indirect utilization of intraday high-frequency data could be more practical in studying GARCH model estimation. A case in point is Visser's work [26], where the information of intraday high-frequency data is firstly transferred to a daily volatility proxy and such a proxy is then adopted to improve daily GARCH model estimation. Further extensions can be referred to [26][27][28][29][30][31][32].
As we all know, in addition to estimation, model test is another important issue in practical modeling. For GARCH model testing, many results have been obtained, see [33][34][35][36][37][38][39]. However, all the available results on the GARCH model test is limited to low-frequency data. To the best of our knowledge, few of them have introduced intraday high frequency data into a daily GARCH model test. Previous studies have shown that high-frequency data can be used to improve daily GARCH mode estimation. A natural question is whether the intraday high-frequency data could also be used to improve the power of daily GARCH model test. Inspired by this, this paper is to study the Wald test and Likelihood Ratio (LR) test of the daily GARCH model by introduction high-frequency data.
The main contribution of this paper is the construction of a new test approach for daily GARCH model with the adoption of intraday high-frequency data. The proposed test statistics are based on the QMLE of daily GARCH model which introduces the intraday high-frequency data, and their distribution are proved to be asymptotic chi-square. It can be found that the proposed test statistics have better size and power performance than traditional ones (without using high-frequency data). Through the theoretical and simulation results, the provided tests are of a certain novelty and superiority.
Based on these theoretical results, we will further study the hypothesis testing of the daily GARCH models (3) and (4). The tests we considered are the Wald test and LR test.

The Test of the GARCH Model
We consider the following null and alternative hypothesis: (3) and (4) under the alternative and θ 0 = (τ 0 , γ 0 , β 0 ) T be the hypothesized value. Generally, the Wald test is based upon the difference between θ 0 andθ, and the LR test is based upon the difference between L N (θ 0 ) and L N (θ). Although the statistics of the two tests differ in construction, they are all reasonable measures of the distance between H 0 and H 1 .

The Wald Test
It is well known that the construction of the Wald test statistic depends on the asymptotic distribution of the parameter estimator. Suppose that the conditions (A1) to (A5) and the QML regularity conditions in Appendix A hold, then the QMLEθ of GARCH models (3) and (4) has a limiting normal distribution: where [8,40].
From the above discussion, we have V(θ) = V 0 + o p (1). That is to say, V 0 can be consistently estimated by V(θ). Therefore, the Wald test statistic can be constructed as follows: where Under the null hypothesis, the statistic T W has a limiting χ 2 (3) distribution, where "3" is the dimension of parameter θ 0 .

The LR Test
The LR test statistic is a constructed base on the difference between the maximum of the likelihood under the null and alternative hypotheses. According to the test procedure in [41], under general conditions, the LR test statistic of GARCH models (3) and (4) can be established by: By Taylor's expansion as well as stationary and ergodic properties of time series [40], we can prove the below theorem which shows that the Wald test and LR test of the GARCH model are equivalent.

Theorem 1.
With the assumptions of (A1)-(A5), as well as QML regularity conditions in Appendix A, both the statistics T W , T LR of the GARCH model are χ 2 (k) distribution, with k being the number of the unknown parameter, under the null, and are asymptotically equivalent.
The detail proofs are relegated to Appendix B.

The Test of GARCH Model with High-Frequency Data
Previous studies have shown that high-frequency data can be adopted to improve the estimation accuracy of GARCH-type models, see [21,26]. An intuitive idea is if we replace theθ in (12) and (13) by a more precise estimator with the expectation that the test statistics would have a better performance. Therefore, in this section, we firstly give a short introduction about GARCH model parameter estimation with high-frequency data based on (3) and (4).

A Brief Review of GARCH Model Estimation Using High-Frequency Data
To utilize high-frequency data in the daily GARCH models (3) and (4), for each trading day n, Visser introduced a continuous log-return process R n (·) to describe the intraday price movements [26]. To simplify the notation, we normalize the trading day to the unit time interval and adjust Equation (3) by replacing the variable Z n by a process Ψ n (·). This yields the following scale model framework: where ν n τ is the scale factor and it is latent. The processes R n (·) are observable with information set F n generated by {R n , R n−1 , ...}. For different day n, the standard processes Ψ n (·) are assumed to be i.i.d and its sample paths are right-continuous and have left limits(cadlag). When u = 1, we have EΨ 2 n (1) = 1, and: where the sequence of random variables (Z n ) is i.i.d implied by the i.i.d assumption of (Ψ n ). It is seen that models (14) and (15) takes the intraday information R n (u) into account and keeps the same parameter to the daily GARCH models (3) and (4). To further estimate the parameters in (14) and (15), we need to use the intraday information to construct a volatility proxy, denoted by H n .
The volatility proxy is a nonnegative function of intraday processes R n (·) and has the property of positive homogeneity: One commonly used volatility proxy is the square root of the realized variance: where r n,k denotes the return over the k-th intraday interval of day n. Due to the positive homogeneity of H n , it follows H n = H(ν n τΨ n ) = ν n τH(Ψ n ). Then we can derive the following volatility proxy model: where It is seen that the models (18) and (19) retain the structural features of the daily GARCH model. The H n is computable and has the same frequency to daily return r n . We further introduce an ancillary random variables y n as follows: where the random variables δ n is independent of the models (18) and (19) and it only takes values {−1, 1}, with a probability of 0.5 for either value. Then, we get: Denote L HN (θ) to be the quasi log-likelihood function of random variables y n . We have: Denote h H,n (θ) = σ 2 H,n (θ) = ν 2 n τ 2 H . Defineθ H to be the QMLE of the models (18) and (19), which is obtained from the following optimization: Similar to the daily GARCH models (3) and (4), under some regularity conditions, the QMLEθ H of the models (18) and (19) is consistent and asymptotically normal: and G H (θ 0 ) −1 depends only on the derivatives of ν 2 n with respect to θ, see Theorem 2.1 in [26]. Consequently, as mentioned by [26], if certain volatility proxy H n has the small value of Var(Z 2 H,n ), then the corresponding estimatorθ H will have small asymptotic variance and hence will be more precise.
Unfortunately, for the QMLEθ H in (22), no detailed procedure is given to choose the optimal volatility proxy H n . To make the estimator more applicable, in this paper, we provide a way to choose the optimal volatility proxy and it can be applied to choose the sampling frequency of intraday high-frequency data.
Let W H,n = H(Ψ n (u)). Then µ H 2 ≡ EH 2 (Ψ n ) = EW 2 H,n , Z H,n = W H,n / EW 2 H,n . It follows that: Note that: H n = H(ν n τΨ n ) = ν n τH(Ψ n ) = ν n τW H,n and W H,n is independent of ν n . Hence we obtain: By dividing two sides of the above two equations, we can get: Then it is not difficult to have: smaller m H 4 ←→ smaller EW 4 H,n /(EW 2 H,n ) 2 ←→ smaller Var(Z 2 H,n ).
Consequently, in practice, we can choose the optimal proxy H n according to its value m H 4 , which can easily be estimated by the related sample means. For a given volatility proxy H n , its calculation depends on the sampling frequency of intraday data. The above criteria gives a way to choose the optimal sampling frequency according to the smallest value of m H 4 .

The Test of GARCH Model Using High-Frequency Data
In this section, we are going to introduce the intraday high-frequency data into the test statistics in (12) and (13) by replacingθ withθ H . Related theoretical results will be discussed.

The Wald Test
Afterθ is replaced byθ H in Formula (12), we denote the new statistic by T • W : Since the convergence ofθ H is obvious (see [26]), the key point is to prove the asymptotic convergence of V(θ H ). Note thatθ a.s. (1). Similar to the proof of asymptotic convergence for V(θ), see Formula (10). It is not difficult to get V(θ H ) a.s. −→ V(θ 0 ). Thus we can still have T • W follows χ 2 (3) distribution.

The LR Test
Likewise, the estimatorθ will be replaced byθ H in Formula (13), and the new statistic will be denoted by T • LR : According to the following theorem, the LR test T • LR is equivalent to the Wald test T • W , and hence has a limiting χ 2 (3) distribution.

Theorem 2.
With the assumptions of (A1-A5), as well as QML regularity conditions in Appendix A, both the Wald test and LR test statistics T • W , T • LR are asymptotically distributed as χ k with k being the number of unknown parameter, under the null, and they are equivalent.
The detail proofs are also relegated to Appendix B.

Simulation Study
In this section, we carry out Monte Carlo experiments to assess the finite-sample performance of the test statistics T • W , T • LR , T W , and T LR . In these experiments, all the size and power results are calculated based on 1000 independent replications.
The calculation of statistics T W and T LR is straightforward, because it just relies on the daily GARCH models (3) and (4). While the computation of T • W and T • LR are based on the estimatorθ H using high-frequency data. Note that when the volatility proxy H n is set to be |r n |, namely the absolute daily return, then T • W and T • LR are reduced to T W and T LR . Following [26], throughout the simulation, the parameter τ involved in the model (3), (4), (18), and (19) is fixed to 1.
To estimate models (18) and (19), we consider the volatility proxies of three frequencies, namely 1 min (RV (1) ), 5 min (RV (5) ), and 10 min (RV (10) ). For 1-min proxy RV (1) , the formula is given by: The 5-min and 10-min volatility proxy can be computed similarly. Given a significance level α = 0.05 and taking the sample size N = 800, 1000, 1200, 1500, and 2000, we calculate the empirical sizes with null hypothesis parameter (γ 0 , β 0 ) = (0.35, 0.5). For the power, we first fix parameter γ = 0.35 and compute the power by changing the value of parameter β. Then, we fix parameter β = 0.5 and change parameter γ to observe the change of the power.
The sizes and powers of the Ward test and LR test calculated byθ H andθ are reported in Tables 1-3. Figure 1 charts the trend of the power in Table 2, Figure 2 charts the trend of the power in Table 3.
It can be seen that the empirical sizes of the Ward test and LR test calculated byθ H are uniformly closer to the nominal size than those calculated byθ in this experiment. For the power, although the results of the Ward test and LR test calculated byθ H do not show their superiority at a low sample size, but with the increase of the sample size, the superiority is more significant. These results indicate that using high-frequency data can significantly improve the test performance for the GARCH model.   (5) RV (10) r n RV (1) RV (5) RV (10)  The null hypothesis of parameter is (γ 0 , β 0 ) = (0.35, 0.5). The power of this table is calculated by changing the value of γ and fixing β = 0.5.

Empirical Study
In this section, we apply the proposed tests to analyze the CSI 300 index with a sampling frequency of 1 min. The data set covers from 15 March 2017 to 29 May 2020 (796 days, 241 observations in each day), containing 191,836 pieces of data. For each trading day n, we compute the intraday log-return process R n (u) as follows: where 0 ≤ u ≤ 1, P n (u) is the n-th intraday price sequence. In practice, it is of much interest to know whether the GARCH effect exists. Hence in this empirical study, we consider the following test problem: Similar to the simulation study, the realized volatility is chosen as the volatility proxy. They is a 1-min proxy RV (1) , 5-min proxy RV (5) , and 10-min proxy RV (10) respectively. The parameter τ is also fixed to 1.
The intraday log-return process R n (·) and daily log-return process r n are ploted in Figures 3 and 4. The Ward test and LR test statistics calculated byθ H andθ are reported in Table 4. Given a nominal significance level 5% , the critical value is χ 2 (2) = 5.9915. We can see from Table 4 that both of the two test statistics of four kinds of estimators are significant. In addition, the Wald test and LR test statistics computed byθ H are more larger than that computed byθ. This indicates that the GARCH model is suitable for the data studied in this paper.

Conclusions
In this article, we focused on the test of the daily GARCH model by taking intraday high-frequency data into account. Based on the daily GARCH model estimation using intraday high-frequency data, the adjusted Wald test and LR test for daily GARCH model are provided. Under some regularity conditions, the two considered test statistics were shown to be asymptotically equivalent and the limiting distribution was χ 2 (k) distribution, with k being the number of unknown parameter. Simulation and empirical studies implied that the proposed test statistics had better size and power performance than traditional ones (without using high-frequency data). Consequently, the considered tests were of a certain novelty and superiority.
For the traditional daily GARCH model test, we could also use the Lagrange multiplier test. But for the adjusted test statistics proposed in this paper, Lagrange multiplier method was not applicable. This is because the Lagrange Multiplier test only uses the parameter estimator of the null hypothesis [41], while our adjusted test required the parameter values of both the null and alternative hypothesis. Hence, the Lagrange Multiplier test was not applicable in our study.
The idea of this paper can easily be extended to study the test of other GARCH type models, such as the GJR-GARCH model, threshold GARCH model, and periodic GARCH model. Moreover, one can also apply the idea to the nonstationary GARCH models and heavy tailed GARCH models. Acknowledgments: The thorough revision and constructive comments of the editor and anonymous reviewers are gratefully acknowledged.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
Let Θ denote the parameter space. The assumptions of Theorems 1 and 2 are: n is a non-degenerate random variable; (A5) EZ 4 n < ∞.

The QML regularity conditions B
Before restating the conditions, we introduce the definition of the Uniform Weak Law of Large Numbers(UWLLN) (see [43]) as follows.
A random functions sequence g n (y n , θ) satisfies the UWLLN, if The QML regularity conditions (refer to [44]) are: (B1) Θ is compact, has nonempty interior and θ 0 ∈ intΘ; (B2) The variance functions h n are measurable functions of the data for all θ ∈ Θ and are twice continuously differentiable with respect to θ on intΘ; (B3) (l n (θ)) satisfies the UWLLN and θ 0 is the identifiably unique maximizer (see [45]) of The Hessian (l n (θ)) satisfies the UWLLN and the expected Hessian The outer product of (l n (θ 0 )l n (θ 0 ) T ) satisfies the UWLLN, and the expected outer product

Appendix B
Appendix B.1. Proof of Theorem 1 Since the LR test statistic is constructed as by Taylor's expansion, we have Obviously, Formula (A1) can be written as where |ζ − θ 0 | ≤ |θ − θ 0 |. We know the quasi log-likelihood function of models (3) and (4) equals where the first and second derivative of likelihood function are computed as follows: From Propositions 3.12, 6.1, 6.2 of [40], one can easily know that l n (θ) and l n (θ) are stationary and ergodic sequence of random elements.