A Fast Algorithm for the Computation of HAC Covariance Matrix Estimators †

: This paper considers the algorithmic implementation of the heteroskedasticity and autocorrelation consistent (HAC) estimation problem for covariance matrices of parameter estimators. We introduce a new algorithm, mainly based on the fast Fourier transform, and show via computer simulation that our algorithm is up to 20 times faster than well-established alternative algorithms. The cumulative effect is substantial if the HAC estimation problem has to be solved repeatedly. Moreover, the bandwidth parameter has no impact on this performance. We provide a general description of the new algorithm as well as code for a reference implementation in R


Introduction
This paper considers the heteroskedasticity and autocorrelation consistent (HAC) estimation of covariance matrices.This estimation problem arises in the construction of large-sample tests for the parameters in linear and nonlinear models.The HAC estimator for the covariance matrix of parameter estimators applies to a variety of model frameworks and estimation methods.Some examples are ordinary least squares (OLS), maximum likelihood, generalized method of moments (GMM) or instrumental variables (see Andrews [1], and Zeileis [2]).It also corresponds to the so-called sandwich estimator in the context of quasi-maximum likelihood (QML) estimation technique (cf.White [3], Chapter 8.3).In this connection we combine two essential topics to applied econometrics: robust covariance matrix estimation and fast computation of covariance matrix estimators.
In the last decades, several techniques for HAC covariance matrix estimation have been proposed in the literature, e.g., Andrews [1], Newey and West [4], White [5], MacKinnon and White [6].These statistical methods go back to earlier literature, such as Jowett [7], Hannan [8], Brillinger [9].Nowadays, these methods are widely used in econometric analysis.Beyond that, researchers require covariance matrices not only for the purpose of some hypothesis tests, but also as stand-alone functions which can be used in various statistical methods.This is becoming more and more relevant as pointed out in Cribari-Neto and Zarkos [10].In spite of the vast econometric literature on this subject, little econometric software is available for the computation of HAC covariance matrix estimators.
The aim of this paper is to show that the computing time for HAC covariance matrix estimators can be decreased massively by using given information about the structure of the HAC covariance matrix estimators together with some matrix algebra.Particularly, we exploit the evaluation of a circulant matrix product, which can be efficiently calculated using the fast Fourier transform.The same calculation idea is employed by Wood and Chan [11] for the simulation of stationary Gaussian processes with prescribed covariance matrix as well as by Jensen and Nielsen [12] for the calculation of fractional differences.
We compare our new algorithm with two popular statistic algorithms: the algorithm by Roncalli [13], which is written for the statistical software GAUSS (cf.Aptech Systems [14]) and the algorithm by Zeileis [15] written for R (cf.R Development Core Team [16]), as well as with the lesser-known algorithm by Kyriakoulis [17] for MATLAB (cf.MathWorks [18]).According to our results, our new algorithm is up to ~20 times faster than the algorithms by Roncalli and Zeileis.The saved time can increase up to a few minutes for only one HAC estimation depending on the settings of the estimation problem.This is particularly relevant for the QML estimation of generalized autoregressive conditional heteroskedastic (GARCH) models (cf.Zivot [19]) and the estimation of stochastic volatility (SV) models via QML (cf.Ruiz [20]) or GMM (cf.Renault [21]) based on large financial datasets with high frequency sampling or multivariate structure.Another application area is the GMM estimation of multifractal volatility models (cf.Bacry et al. [22], Lux [23]), which proves to be a time consuming issue even in the univariate case with daily data.Thus reliable estimation results for the Multifractal Random Walk model require a data sample of size N > 2000, all the more the asymptotic normality of the GMM estimates can be reached only for sample sizes not less than ca.16,000 data points (cf.Bacry et al. [24]).The cumulative effect is substantial if the HAC estimation problem has to be solved repeatedly (e.g., in the case of an iterated GMM estimation or for the purpose of simulation and forecast studies.).
Moreover, our algorithm does not employ the bandwidth parameter explicitly.Its performance is independent of the value of the bandwidth parameter as contrasted with the algorithms by Roncalli and Zeileis.
The paper is organized as follows.In Section 2 we give an overview on the HAC estimation problem and some of its application fields and introduce the notation we use.Section 3 combines some matrix algebra results and the structure of HAC estimators in order to introduce the new algorithm.In Section 4 we discuss the alternative algorithms.Section 5 investigates the performance issues of our new algorithm as compared with the alternative algorithms.We replicate the HAC computation steps in Chaussé and Xu [25] for the estimation of a SV model with high frequency data as well as in Lux et al. [26] for the purpose of a forecast study and report the computing time.We show that the new algorithm outperforms the other algorithms in the majority of cases we analyse.There are some isolated cases where our algorithm performs more slowly.However the computational cost is below 1 millisecond, which should be irrelevant in practice.The R-Codes for the different HAC covariance matrix estimators are given in the Appendix.

The Estimation Problem
We consider (a t ) t∈Z a stationary ergodic q-dimensional stochastic process of mean zero and (Γ τ ) τ∈Z its autocovariance matrices We want to estimate the quantity where N denotes the number of given observations.S N can be also written as (cf.Smith [27]) This estimation problem can be solved in the limit for N → ∞, i.e., with f (0) the spectral density matrix of the process (a t ) in 0. The estimation of S is a nonparametric spectral estimation problem with the corresponding lag window spectral estimator Γ τ denotes the empirical autocovariance matrix of lag τ with 0 ≤ τ ≤ N − 1 and ω τ,N is a function of weights (cf.Newey and West [4]).S N is weakly consistent for given choice of ω τ,N and it is called a Heteroskedasticity and Autocorrelation Consistent (HAC) covariance matrix estimator for reasons to be explained below.
Let the bandwidth parameter b N control the number of nonzero weights, ω τ,N = 0 for τ > b N .Then we can also write Equation ( 5) as follows Note that the algorithm considered in this paper does not require the specification of a bandwidth parameter.In the following we suppress the index N for reasons of simplicity and write ω τ and b, respectively.

Application
This estimation problem can be applied to various econometric fields depending on the choice of (a t ).Its main interest resides in the construction of large-sample tests.Many parameter estimators θ N in nonlinear dynamic models satisfy with θ 0 the true parameter value to be estimated, M a non-random matrix and S given in (4).See Andrews [1] on the estimation of M. One can construct tests about the value of θ 0 based on the approximate distribution of θ N in large samples where S can be estimated by S N in (5).It is now obvious why we call S N a covariance matrix estimator.

The Case of the OLS Estimator
Consider the linear model In the case of homoskedastic and uncorrelated errors, E [uu ] = σ 2 I, the covariance matrix of θ simplifies to and it can be easily estimated by with s 2 an unbiased estimator for σ 2 .In the general case of heteroskedasticity and dependence of unknown forms of the error term u one can estimate the following asymptotic covariance matrix with This estimation can be performed using the HAC covariance matrix estimator (cf.formula ( 5)) based on the process (a t ) = (X t u t ).
The OLS estimator satisfies with Q = lim N→∞ 1 N X X a finite nonsingular matrix and respectively in large samples

The Case of the GMM Estimator
Consider the model-free GMM estimation of θ 0 using q moment conditions.In this case, the process (a t ) contains the q-dimensional deviation of the empirical moments The GMM estimator is given by with W some weighting matrix (Hall [28]).Under some regularity conditions the GMM estimator is weakly consistent and asymptotically normally distributed where M is a non-random matrix and = lim Again we employ the HAC covariance matrix estimator (cf.formula ( 5)) in order to estimate S.

The Algorithm
In this paper, we introduce a fast algorithm for the computation of the HAC covariance matrix estimator S N in (5).This is based on the equivalent representation (cf.Kyriakoulis [17]) with A ∈ R N×q and The matrix T(ω) denotes the symmetric N × N Toeplitz matrix with first column ω given by the weights For a more memory space-efficient computation of the matrix product T(ω)A we are using a special circulant matrix (cf.Van Loan [29]).Therefore, we embed the Toeplitz matrix T(ω) in a symmetric circulant matrix C(ω * ) ∈ R 2N×2N with Furthermore, we construct the 2N × q matrix A * A * = A 0 N×q (30) by adding a N × q matrix containing only zeros at the bottom of A.

•
The Toeplitz matrix T(ω) is given by the first N rows and first N columns of C(ω * ), i.e., Generally we denote with M a:b,c:d ∈ R m×n a sub-matrix of M containing the rows from a to b and the The necessary product T(ω)A is given by Thus, the fast evaluation of C(ω * )A * permits fast evaluation of T(ω)A.
The following theorem explains how the matrix product T(ω)A (cf.formula ( 26)) can be computed in a fast way by means of the discrete Fourier transform (DFT) and its inverse transform.It provides the basis for our new algorithm.
Theorem 1 (Circulant matrix and its eigenvalues and eigenvectors).Let C(c) ∈ R n×n be a circulant matrix with first column c = c 1 . . .c n and let VΛV * be the matrix decomposition of C(c), i.e., Thereby, λ k (k = 1, . . ., n) are the eigenvalues of C(c) and v k (k = 1, . . ., n) the corresponding eigenvectors.The matrix Λ is diagonal with Λ = diag(λ 1 , . . ., λ n ) and V is a matrix containing the eigenvectors, i.e., V = v 1 . . .v n .The matrix V * is the complex conjugate of V.Then, the following properties hold true: 1.The eigenvalues λ k are the discrete Fourier transform (DFT) of the column vector c, i.e., for k = 1, . . ., n. 2. The orthornomal left eigenvectors v k (k = 1, . . ., n) are given by ). 3. The product V * x, for any x ∈ R n , is given by the DFT of x. 4. The product y = Vx ∈ R n , for any x ∈ R n , is given by the inverse discrete Fourier Transform (IDFT) of x, i.e., for k = 1, . . ., n.
We now introduce the new algorithm (cf.Algorithm 1) for the computation of HAC covariance matrix estimators on the basis of Theorem 1.In the following we assume that the weights ω τ (τ = 1, . . ., N − 1) are known.
Algorithm 1.The algorithm is given in five steps while step three is subdivided into three steps.

Construct the matrix A
using A from Equation (27).3.For all j ∈ {1, . . ., q} compute the columns of the matrix C(ω * )A * .These columns are written as C(ω * )A * j = VΛV * A * j while A * j is the j-th column of A * .This computation is done in three steps: (a) Determine V * A * j given by the DFT of A * j .(b) Multiply for all i ∈ {1, . . ., 2N} the i-th entry of the vector V * A * j with the eigenvalue λ i , in order to construct ΛV * A * j .(c) Determine C(ω * )A * j = VΛV * A * j given by the IDFT of ΛV * A * j .4. Select the upper N × q block of C(ω * )A * .This upper block results in T(ω)A, i.e.,: 5. Determine S N = 1 N A T(ω)A.

Alternative Algorithms
In this paper, we compare our new algorithm with three alternative algorithms currently used.The first algorithm that we consider was developed by Roncalli [13] and can be found in the time series library TSM (Time Series and Wavelets for Finance) for the statistical software GAUSS (cf.Aptech Systems [14]).Here the computation of HAC estimators S N is implemented by means of a for-loop according to expression (7) combined with an ingenious matrix product, which enables the fast computation of the autocovariance matrices Γ τ : Algorithm 2 (Roncalli).

Determine S N = L.
The second algorithm by Zeileis [15] is part of the "sandwich" package for the statistical software R (cf.R Development Core Team [16]).This algorithm is similar to Roncalli's algorithm except for the calculation procedure for Γ τ in Step 2. His procedure is less efficient than Roncalli's as it requires the sequential updating of two matrices instead of one.Algorithm 3 (Zeileis).
Finally, the algorithm by Kyriakoulis [17] for MATLAB (cf.MathWorks [18]) excels in terms of its elegance and simplicity.This algorithm avoids the resource-intensive recursive summations employed above using instead expression (26) and is the basis for the new algorithm introduced in the previous section.It consists of only two steps: Algorithm 4 (Kyriakoulis).

Construct the symmetric Toeplitz matrix T (ω) with the first column
(41) A big drawback of this algorithm is the memory space-inefficient handling of the N × N matrix T (ω).On account of this the program runs out of memory and fails to compute S N for series longer than N = 10,000 data points.This problem could be solved within the scope of our algorithm, which does not employ the matrix T (ω) explicitly.

Comparing Different Algorithms for the Computation of HAC Covariance Matrix Estimators
In this section we present the gains in absolute and relative computing times that are achieved by our new algorithm compared with the three alternative algorithms discussed in the previous section.
All four algorithms were programmed and run in R (cf.R Development Core Team [16]) for reasons of comparability.

•
We used the "fftwtools"-package of R for the fft-function.The four algorithms run a little bit faster when using the "compiler"-package of R, but relative computing times are nearly the same.
We used the following hard-and software: We measured the computing time for different values of b, N and q.The matrix A was randomly generated for every set of (b, N, q), using a normally distributed random number generator with mean 0 and standard deviation 10.Nonetheless neither the distribution nor the parameters of the random number generator influenced the results significantly.All four algorithms were applied to the same matrix A (given b, N and q).After each application all variables in R except for A, b, N and q were deleted.
We used the weights ω τ for the quadratic spectral kernel function, since this kernel function is probably most frequently used in the literature (cf.Zeileis [15]).An overview of different weights can be found in Andrews [1].The results of the computing times are given in Table 1 in absolute time (in milliseconds) and in Table 2 in relative time (compared with our new algorithm).
Obviously, in Table 1 one can see that our new algorithm has the advantage that the bandwidth b has no impact on its performance, while it has for the Roncalli and Zeileis algorithms.The saved time can increase up to a few minutes for only one HAC estimation depending on the settings of the estimation problem.
Figure 1 shows a plot of absolute computation times of our new algorithm against the algorithms of Roncalli [13] and Zeileis [15] for different bandwidths b (N = 10 6 and q = 10).One can see again that our new algorithm is independent of the bandwidth b while the algorithms of Roncalli [13] and Zeileis [15] are not.This encouraging performance opens up new possibilities of using large bandwidths in combination with large datasets.We leave this issue for future exploration.Table 2. Relative computing times (compared to our new algorithm) for different values of b, N and q (Note: R runs out of memory in the "blank"-cases.).9.72 12.36 6.82 10.13 12.74 q q q q q q q q q q 20 40 60 Absolute computation times of our new algorithm against the algorithms of Roncalli [13] and Zeileis [15] as a function of the bandwidth b.The parameter set was N = 10 6 and q = 10.

N
Let us exemplify how Table 2 needs to be read.For example consider the case b = 60, N = 10 5 and q = 30.Then the algorithm proposed by Roncalli [13] needs 8.19 times more computation time compared with our new algorithm and the algorithm proposed by Zeileis [15] needs 8.51 times more computation time compared with our new algorithm.
Table 2 can be summarized as follows: • Compared with the algorithm proposed by Roncalli [13] our new algorithm is between 1.83 times and 15.03 times faster.

•
Compared with the algorithm proposed by Zeileis [15] our new algorithm is between 2.04 and 15.82 times faster.

•
Compared with the algorithm proposed by Kyriakoulis [17] our new algorithm is between 7.68 and 140.40 times faster, while the algorithm proposed by Kyriakoulis [17] runs out of memory for N > 10 4 .
Overall, our new algorithm is faster than any of the compared algorithms.The time saved while using the new algorithm can increase considerably, especially if the HAC estimation problem has to be solved repeatedly.For example, the iterated GMM estimation procedure requires an update of the estimated covariance matrix in each step.If we consider 50 estimation steps, then our new algorithm can save up to ~95 min compared with the algorithms proposed by Roncalli [13] or Zeileis [15] (b = 100, N = 10 6 and q = 30).Even in the case of a shorter dataset (N = 10 5 ) we would still save up to ~9 minutes as compared with the alternative algorithms (b = 100, N = 10 5 and q = 30).
Only in the case of a small N (N ∈ {100, 500}) combined with a small bandwidth (b = 30) and only few moment conditions (q = 10) does our algorithm perform more slowly.This is in accordance with the performance pattern in Jensen and Nielsen [12].
However, the computational cost is below 1 millisecond, which should be irrelevant in practice.Our algorithm reaches its highest performance approximately for N between 500 and 1000.After N = 1000 the performance of our new algorithm reduces, but still remains better than Roncalli [13] or Zeileis [15].At the same time, the absolute computing times increase significantly, which leads to a considerable difference in computational speed between the competing algorithms.This is also illustrated below.q=10, b=30 q=10, b=60 q=10, b=100 q=20, b=30 q=20, b=60 q=20, b=100 q=30, b=30 q=30, b=60 q=30, b=100 Figure 2. Relative computing times of Roncalli [13] and Zeileis [15] compared with our new algorithm for different parameter constellations (parameter N reaches from 10 2 to 10 6 ).The green line is at "y = 1".The x-axis is logarithmic.Reading example: In the comparison of "Zeileis vs. NEW" with the parameter set N = 10 3 , q = 30 and b = 100 (blue dotted line) the NEW algorithm is about 20 times faster compared to the algorithm proposed by Zeileis [15].
We replicate the HAC estimation problem in two empirical applications and report the computing time for the three competing algorithms.Remark 3. We used the "fftwtools"-package of R for the fft-function.Additionally, the time series was padded with zeros such that the total length of the series was a power of two.
The first empirical application is the estimation of a generalized asymmetric SV model with realized volatility (GASV-RV) in Chaussé and Xu [25] based on high frequency financial data.The estimation sample spans 5 years (2003-2008) and a total of N = 1,456,650 observations.The authors consider four different GMM estimation procedures, each of them using the HAC covariance matrix estimation and various sets of moment conditions with q = 36 moments at most.We replicated this estimation problem and estimated only the corresponding HAC covariance matrices based on randomly generated data, so that our new algorithm can directly be compared with the other ones.According to our results in Table 3 the computation time is significant even for one single HAC estimation due to the large N. Altogether (four estimations, six assets), our new algorithm can save up to ~26 min as compared with Roncalli [13] or Zeileis [15].It is important to note that Chaussé and Xu [25] use one-step GMM.The estimation problem would be all the more time consuming for iterated estimations.Table 3. Computation times and gain in time in minutes for our new algorithm compared to the ones proposed by Roncalli [13] and Zeileis [15] for the empirical applications in Chaussé and Xu [25] and Lux et al. [26].The empirical application in Chaussé and Xu [25] is a comparative study between the GASV-RV model and the GARCH model with realized volatility of Hansen et al. [33].The original dataset in Hansen et al. [33] comprises 29 assets over a time period of 6 years (N = 1,747,980).However Chaussé and Xu [25] restricted their analysis to only 6 assets and 5 years, respectively, most likely due to the enormous expenditure of time (see Table 3 for the computation times based on the original dataset).

Time in Minutes
The second empirical application is the forecast study in Lux et al. [26].The authors consider three forecast problems with different out-of-sample periods: the "full" sample (July 2005-April 2009), the "tranquil" sample (July 2005-July 2007) and the "turbulent" sample (July 2007-April 2009) including the financial crisis.From an estimation point of view, the "tranquil" sample scenario is redundant, since the relevant estimation results can be simply borrowed from the "full" sample problem.On account of this, we replicated the estimation problem only for the "full" sample and the "turbulent" sample problem and estimated only the corresponding HAC covariance matrices based on randomly generated data.We assumed recursive estimations with rolling time window after each forecast.Consider S&P 500 over the period roughly from 1983 to 2009.Then the "turbulent" sample forecast problem requires 454 estimations with sample sizes from N = 6067 to N = 6520 whereas the "full" sample problem requires 949 estimations with sample sizes from N = 5572 to N = 6520.In each estimation step three models (the Binomial Markov-switching multifractal (MSM) model, the Log-normal MSM model and the Log-normal MSM model with realized volatility) were considered together with the iterated GMM procedure (approx.30 iterations with q = 9 and b = 30).The gain in time as well as the overall computation time for the case of S&P 500 is given in Table 3.This time saving cumulates rapidly when considering a number of five assets, as in Lux et al. [26].

Figure 1 .
Figure1.Absolute computation times of our new algorithm against the algorithms of Roncalli[13] and Zeileis[15] as a function of the bandwidth b.The parameter set was N = 10 6 and q = 10.

Table 1 .
Absolute computing time (in milliseconds) for different values of b, N and q (Note: R runs out of memory in the "blank"-cases.).