Parsimonious Heterogeneous ARCH Models for High Frequency Modeling

: In this work we study a variant of the GARCH model when we consider the arrival of heterogeneous information in high-frequency data. This model is known as HARCH( n ). We modify the HARCH( n ) model when taking into consideration some market components that we consider important to the modeling process. This model, called parsimonious HARCH( m,p ), takes into account the heterogeneous information present in the ﬁnancial market and the long memory of volatility. Some theoretical properties of this model are studied. We used maximum likelihood and Griddy-Gibbs sampling to estimate the parameters of the proposed model and apply it to model the Euro-Dollar exchange rate series.


Introduction
High frequency data are those measured in small time intervals. This kind of data is important to study the micro structure of financial markets and also because their use is becoming feasible due to the increase of computational power and data storage.
Perhaps the most popular model used to estimate the volatility in a financial time series is the GARCH(1,1) model; see Engle (1982), Bollerslev (1986): with α 0 > 0, α 1 ≥ 0, β 1 ≥ 0, α 1 + β 1 < 1. When we use high frequency data in conjunction with GARCH models, these need to be modified to incorporate the financial market micro structure. For example, we need to incorporate heterogeneous characteristics that appear when there are many traders working in a financial market trading with different time horizons.
The HARCH(n) model was introduced by Müller et al. (1997) to try to solve this problem. In fact, this model incorporates heterogeneous characteristics of high frequency financial time series and it is given by where c 0 > 0, c n > 0, c j ≥ 0 ∀j = 1, . . . , n − 1 and ε t are identically and independent distributed (i.i.d.) random variables with zero expectation and unit variance. However, this model has a high computational cost to fit when compared with GARCH models, due to the long memory of volatility, so the number of parameters to be estimated is usually large.
We propose a new model known as the parsimonious heterogeneous autoregressive conditional heteroscedastic model, in short-form PHARCH, as an extension of the HARCH model. Specifically, we call a PHARCH(m,p), with aggregations of different sizes a 1 , . . . , a m , where m is the number of the market components, the model given by where ε t ∼ i.i.d. (0, 1), C 0 > 0, C j ≥ 0, ∀j = 1, . . . , m − 1, C m > 0, b j ≥ 0, j = 1, . . . , p. HARCH models are important because they take account the natural behavior of the traders in the market. However they have some problems, mainly because they need to include several aggregations, so the number of parameters to estimate is large, because of the large memory feature of financial time series. Parsimonious HARCH includes only the most important aggregations in its structure, which makes the model more realistic. We can see some simulations in Figure 1, where the characteristics of clustering and volatility are better represented in PHARCH processes than in ARCH or HARCH processes. The organization of the paper is as follows. In Section 2 we provide some background information on Markov chains and give the necessary and sufficient conditions for the PHARCH model to be stationary. In Section 3 we obtain forecasts for the proposed model, and in Section 4 we introduce the data that will be used for illustrative purposes. The actual application is given in Section 5, and we close the paper with some conclusions in Section 6.

Background
In this section we provide briefly some background on Markov chains and results on stationarity of PHARCH models.

1.
A function f : Ω → IR is called the smallest semi-continuous function if lim inf y→x f (y) ≥ f (x), x ∈ Ω. If P(·, A) is the smallest semi-continuous function for any open set A ∈ B(Ω), we say that (the chain) X is a weak Feller chain.

2.
A chain X is called ϕ-irreducible if there exists a measure ϕ on B(Ω) such that, for all x, whenever ϕ(A) > 0, we have, The measure ψ is called maximal with respect to ϕ, and we write ψ > ϕ, if ψ(A) = 0 implies ϕ(A) = 0, for all A ∈ B(Ω). If X is ϕ-irreducible, then there exists a probability measure ψ, maximal, such that X is ψ-irreducible.

4.
Let d = {d(n)} a distribution or a probability measure on Z + , and consider the Markov chain X d with transition kernel If there exits a transition kernel T satisfying If X is a Markov chain for which there exits a (sample) distribution d such that K d has a continuous component T, with T(x, Ω) > 0, ∀x, then X is called a T-chain.

6.
A measure π over B(Ω), σ-finite, with the property is called an invariant measure.
The following two lemmas will be useful. See Meyn and Tweedie (1996) for the proofs and further details. We denote by I A (·) the indicator function of A.
Lemma 1. Suppose that X is a weak Feller chain. Let C ∈ B(Ω) be a compact set and V a positive function. If then there exists an invariant measure, finite on compact sets of Ω.

Lemma 2.
Suppose that X is a weak Feller chain. Let C ∈ B(Ω) be a compact set, and V a positive function that is finite at some with b a constant, b < ∞, then there exists an invariant probability measure π on B(Ω).
Lemma 3. Suppose that X is a ψ-irreducible aperiodic chain. Then the following conditions are equivalent: 1.

Stationarity of PHARCH(m,p) Models
We first give a necessary condition for Model (3) to be stationary. We know that E r t−i r t−j = 0, ∀i = j, and if r t is stationary, we must have To prove a sufficient condition it will be necessary to represent the PHARCH(m,p) as a Markov process. We use the definitions given in the previous section, so the process whose elements follow Equation (3), is also a T-chain.
The proofs of the following results are based on Dacorogna et al. (1996), and they are given in the Appendix A.
Proposition 1. The Markov Chain X t that represents a PHARCH(m,p) process is a T-chain.
Proposition 2. The Markov Chain that represents a PHARCH(m,p) process is recurrent with an invariant probability measure (stationary distribution), and its second moments are finite if the condition given in (4) is satisfied.
Note that if ε t ∼ t(v) (a Student's t distribution with ν degrees of freedom) in (3), then the necessary and sufficient condition becomes

Forecasting
In this section we make some considerations about forecasting and validation of the proposed model. Usually two data bases are used for testing tha forecasting ability of a model: one (in-sample), used for estimation, and the other (out-of-sample) used for comparing forecasts with true values. There is an extra complication in the case of volatility models: there is no unique definition of volatility. Andersen and Bollerslev (1998) show that if wrong estimates of volatility are used, evaluation of forecasting accuracy is compromised. We could use the realized volatility as a basis for comparison, or use some trading system. We could, for example, have a model for hourly returns and use the realized volatility computed from 15 min returns for comparisons. In general, we can compute v h,t = ∑ a h i=1 r 2 t−i , where a h is the aggregation factor (4, in the case of 15 min returns). Then use some measure based on s h =ṽ h,t − v h,t , for example, mean squared error, whereṽ h,t is the volatility predicted by the proposed model. See Taylor and Xu (1997), for example. Now consider Model (3). The forecast of volatility at origin t and horizon is given bŷ where X t = (r t , σ t , r t−1 , σ t−1 , ...), for l = 1, 2, . . . Since a 0 = 1 < a 1 < a 2 < . . . < a m < ∞, then we have three cases: If l is such that a s−1 < l < a s , s = 1, 2, . . . , m, then we have, and given the independence of ε t and E(ε t ) = 0, we have E r t−i r t−j = 0, ∀i = j; hence, where, for i = 1, . . . , p, we have thatσ 2 If l is such that l > a m , s = 1, 2, . . . , m, then it follows

High Frequency Data
In this section we further elaborate on high frequency data and introduce the series that will be analyzed later. High frequency data are very important in the financial environment, mainly because there exist large movements in short intervals of time. This aspect represents an interesting opportunity for trading. Furthermore, it is well known that volatilities in different frequencies have significant cross-correlation. We can even say that coarse volatility predicts fine volatility better than the inverse, as shown in Dacorogna et al. (2001).
As an example, take the tick by tick foreign exchange (FX) time series Euro-Dollar, from January First 1999 to December 31, 2002. Returns are calculated using bid and ask prices, as We discard Saturdays and Sundays, and we replace holidays with the means of the last ten observations of the returns for each respective hour and day. After cleaning the data (see Dacorogna et al. (2001), for details) we will consider equally spaced returns, with sampling interval ∆t = 15 min. This seems to be adequate, as many studies indicate. Figure 2 shows Euro-Dollar returns calculated as above. The length of this time series is 95,317. The figure shows that the absolute returns present a seasonal pattern. This is due to the fact that physical time does not follow, necessarily, the same pattern as the business time. This is a typical behavior of a financial time series and we will use a seasonal adjustment procedure similar to that of Martens et al. (2002). However, we will use absolute returns instead of squared returns; that is, we will compute the seasonal pattern as where r d,s,h is the return in the weekday d, week s and hour h, and s is the number of weeks from the beginning of the series. Therefore, S d,N s ,h is the rolling window mean of the absolute returns with the beginning fixed.
In Figure 3 we have the autocorrelation function of these returns and of squared returns. The seasonality pattern is no longer present.
FX data has some distinct characteristics, mainly because they are produced twenty four hours a day, seven days a week. In particular, Euro-Dollar is the most liquid FX in the world. However, there are periods where the activity is greater or smaller, causing seasonal patterns to occur, as seen above.
Let us analyze some facts about these returns that we will denote simply by r t . We can see in Figure 4 the histogram fitted with a non-parametric density kernel estimate, using unbiased cross-validation method to estimate the bandwidth. It shows fat tails and high kurtosis, namely, 121, while its skewness coefficient is −0.079, showing almost symmetry. A normality test (Jarque-Bera) rejects the hypothesis that these returns are normal.  The seasonally adjusted returns are then given bỹ We may assume for example that the errors of a GARCH model fitted to these returns follow a Student's t distribution or a generalized error distribution, which represents better the fat tails of the distribution.
Often the optimization of the likelihood function can be a very difficult task, due mainly to the flat behavior of likelihood function, as can be seen in Zumbach (2000). Bayesian methods are an alternative, and in the next section we will use the Griddy-Gibbs sampling to estimate the parameters of a PHARCH model. Figure 4 also shows that the Euro-Dollar series has some clusters of volatility. This is a typical behavior of financial time series. A problem is that we do not know how many clusters there are and what their sizes are. The reason for this is that the information arriving is different for each sampling frequency.
We can look these clusters as market components and they depend on the heterogeneity of the market. These market components are considered in our PHARCH model, as seen in Equation (3). Differently from GARCH-type models, PHARCH models have a variance equation with returns over intervals of different sizes. Therefore PHARCH models take into account the sign of the returns and not only their absolute value as GARCH models do. Two subsequent returns with similar sizes in the same direction will cause a higher impact on the variance than two subsequent returns with similar sizes but opposite signs.
Now we need to determine the number and the size of the market components for the Euro-Dollar FX series. Ruilova (2007) proposed some technical rules to determine these market components, and Dacorogna et al. (2001) proposed some empirical rules.
To help us to determine if the component sizes chosen are correct we can use the impact of the component.
We define the impact I i of the ith component as, Note that the stationary condition to PHARCH(m) models can be written in terms of these impacts; namely, We also notice that if we consider the Student's t distribution with ν degrees of freedom, the impact should be defined as As remarked above, the number of components in a financial series can vary depending how the returns are being traded in this market. That is, liquid series can have a structure with more components than a non-liquid series.

Application
Due to the complexity of the proposed model, the likelihod function may be flat in the neighbourhood of the maxima, so the optimization procedure using traditional procedures may fail. An alternative is to use Bayesian methods. Some references on the use of Bayesian procedures for the family of ARCH processes are Geweke (1989), Kleibergen and Dijk (1993), Geweke (1994) and Bauwens and Lubrano (1998).   It is well known that when the analytical expressions of the full conditional distributions are known we can use Gibbs sampling. However, if the conditional distributions are not known, we need to modify the algorithm or to use another algorithm, such as the Metropolis-Hastings one. Another alternative to solve this is to use the Griddy-Gibbs sampler of Ritter and Tanner (1992).
Griddy-Gibbs sampling can be used when the joint conditional distribution of at least one parameter does not have a distribution form known but it has an analytical expression that can be evaluated on a grid of points. For that, we evaluate the analytical expressions of the joint conditional distribution function, and by numerical integration we can generate random variables of this distribution; see Davis and Rabinowitz (1975).
A problem that appears when we use Griddy-Gibbs sampling is to determine a window and the number of points where we will evaluate numerically the desired function. An inadequate determination of this grid of points could cause errors in the parameters estimation. In general it seems suitable to have 50 points in the grid for a good evaluation.
An important fact is that aggregate returns have, generally, a magnitude greater than non-aggregate returns, so the components with larger aggregations have smaller values. For this reason, we can use the impacts defined above to study the contribution of each component to the model.
In order to establish the number of components in the PHARCH model, we will use information of the financial market behavior based on the behavior of the traders. So, we consider five components, as seen in the Table 1, corresponding to information arriving at the market at the rate of 15 min, 1 h, 1 day, 1 week and 1 month. This means that we need to estimate the parameters of a PHARCH(5) process with aggregations 1, 4, 96, 480 and 1920 as follows.
where C j ≥ 0, j = 1, . . . , 5, C 0 > 0 , and ε t ∼ t (0, 1, v). The number of parameters to estimate is seven because we considered ε t ∼ t (0, 1, v), v > 2. We use an autoregressive processes to filter the data and to take into account the information given by the acf function of the returns shown in the Figure 3.
We consider non-informative priors, that is, uniform distributions on the parametric space, as follows: C 0 , C 1 , C 2 , C 3 , C 4 , C 5 ∼ U(0, 1) and v ∼ U(3, ct), where U denotes the uniform distribution and ct is a large number; in particular, we used ct = 50.
Estimates using maximum likelihood (ML) are shown in Table 2, and the corresponding impacts are shown in Figure 5. The optimizer used to evaluate the impacts was simulated annealing; see Belisle (1992). Several problems were faced in the process of using ML because in some situations the optimizer did not converge. Sometimes we can solve this problem, using initial values near to optimum. But this may not be a normal situation in real cases. So the need for alternative procedures.
As we can see, the impact of the components decreases for larger aggregations. This is a natural result because intraday traders are those who dominate the market. Another fact is that the weekly component has a similar impact to the monthly component, meaning that both have similar weight contributions to predicting volatility. The results show that an impact can be significant even when the parameter is small. The above estimates will serve as a comparison with Griddy-Gibbs estimates.
In the Griddy-Gibbs sampling we use a moving window: we define a new window in each iteration as a function of the mean, mode and standard deviation.  Impact of the Components Figure 5. Impact of the components estimated by maximum likelihood. Table 3 shows the results of the estimation of a parsimonious HARCH(5) model for the Euro-Dollar, using this criterion of selection for a moving window for each parameter, where the conditional density will be computed. The number of points where this density was evaluated was 50.
We used the non conditional and conditional mean in each step of the iteration to calculate the estimate parameters. As expected, conditional method was faster than non-conditional, but the difference was very small.
We see that the values are practically the same by both methods (maximum likelihood and Griddy-Gibbs sampling).
In Figure 6 we can see the convergence of the parameters using Griddy-Gibbs for each iteration step. We can see the fast convergence of the parameters. Now, we compare HARCH modeling with GARCH modeling. In Figures 7-9 we present a residual analysis after the fitting of a GARCH model. In Figures 10-12 we have the corresponding graphs for the PHARCH(5) fitting. We see a slightly better fit of the PHARCH model. If we use the prediction mean squared error (PMSE) as a criterion for comparison, we obtain the values 15.58 and 15.20, for GARCH and PHARCH modeling, respectively, using the standardized residuals and 1000 values for the prediction period.

Conclusions
PHARCH models are good models for the analysis of high frequency data, since the financial market agents behave differently, incorporating heterogeneous information to the market microstructure. Nevertheless, their use still depends on solving some issues, mainly computational ones.
One big challenge in the analysis of high frequency data is dealing with large amounts of observations, and complex models bring computational difficulties, even with the recent technological breakthroughs in computing technology. Therefore, the first issue here is to develop techniques that help us to improve the computational algorithms. Maximum likelihood estimation may collapse, as we have described earlier. Techniques such as genetic algorithms and neural networks are viable optimization alternatives.
Another possibility is to use Bayesian techniques, such as the Griddy-Gibbs samples that we have used. The disadvantage of the Griddy-Gibbs sampler lies in its high computational load. From another viewpoint, more sophisticated volatility models might be developed, taking into account the arrival of information, for example, a stochastic volatility model or stochastic duration model; or we could adapt existing models such as CHARN (conditional heteroskedasticity nonlinear autoregressive) models to heterogeneity of information characteristics. Finally, extensions similar to those proposed to GARCH models could be studied for HARCH models.
A feature of the HARCH models is that the market components are chosen in a subjective way. In the analysis of the Euro-Dollar series, we considered five components, with different aggregations. A different number of components could be proposed, depending of the degree of information one has. This is clearly a matter for further studies.
One last remark is that the performance of the different estimation methods should be evaluated. This evaluation could be done using prediction capabilities, for example. Other possibility is calculating some measure of risk. Volatility models are often established with the purpose of computing the VaR (value at risk) or other risk measure or for establishing trading strategies. In this context, an evaluation of the performance of the proposed model and several estimation procedures should be interesting. A comparison of returns of different trading systems that use a proposed model will be of fundamental importance. Further details on these aspects can be found in Acar and Satchell (2002), Dunis et al. (2003), Ghysels and Jasiak (1994) and Park and Irwin (2005).
Other models for high frequency data use the realized volatility as a basis, instead of models such as ours and models of the ARCH family, which assume that volatility is a letent variable. Among the former, we mention the autoregressive fractionally integrated moving average (ARFIMA) models, the heterogeneous autoregressive model of realized volatilidade (HAR-RV) of Corsi (2009) and the mixed data sampling regression (MIDAS) proposed by Ghysels Santa-Clara and Valkanov (2002). A comparison of the PHARCH models with HAR and MIDAS would be useful, but due to the length of the present paper, this will be the object of future research.
Author Contributions: The authors study some theoretical properties of the PHARCH models and illustrate the theory with an application to real data. All authors have read and agreed to the published version of the manuscript.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
Proof of Proposition 1. Let X t following (3) and (5). Assume that the innovations distribution has a non-zero absolutely continuous component, with a positive density on a Borel set with a non-empty interior. Examples of this are the normal and Student's t distribution.
Then X t can be written as X t = H(X t−1 , ε t ), where H is a non-linear continuous function for each ε t fixed. Then, using the Continuous Mapping Theorem we obtain the weak convergence of X t , namely, the conditional distribution X t given X t−1 = y k converges to the conditional distribution of X t given X t−1 = y if y k → y. So, the Markov Chain X t that represents the PHARCH(m,p) process is also a T-chain.