Theoretical and empirical differences between diagonal and full BEKK for risk management

The purpose of the paper is to explore the relative biases in the estimation of the Full BEKK model as compared with the Diagonal BEKK model, which is used as a theoretical and empirical benchmark. Chang and McAleer et al., 2017 show that univariate GARCH is not a special case of multivariate GARCH, specifically, the Full BEKK model, and demonstrate that Full BEKK, which, in practice, is estimated almost exclusively, has no underlying stochastic process, regularity conditions, or asymptotic properties. Diagonal BEKK (DBEKK) does not suffer from these limitations, and hence provides a suitable benchmark. We use simulated financial returns series to contrast estimates of the conditional variances and covariances from DBEKK and BEKK. The results of non-parametric tests suggest evidence of considerable bias in the Full BEKK estimates. The results of quantile regression analysis show there is a systematic relationship between the two sets of estimates as we move across the quantiles. Estimates of conditional variances from Full BEKK, relative to those from DBEKK are relatively lower in the left tail and higher in the right tail. The BEKK model is a commonly applied multivariate volatility model frequently used in modelling and forecasting volatilities in financial applications. Our results suggest that it is subject to considerable bias and this should be considered by potential users.


Introduction
Conditional volatility models are the most widely estimated univariate and multivariate models of time-varying volatility (or dynamic risk) applied to financial data, in the high frequency data domains that are measured in days, hours and minutes. The stochastic processes, regularity conditions and asymptotic properties of these popular univariate conditional volatility models, such as GARCH (see Engle [1] and Bollerslev [2]) and GJR (see Glosten and Runkle [3]), are well established in the

Data
The original data series comprised ten years of daily price-adjusted data, from 5 March 2007 to 3 March 2017, for Google, IBM and Microsoft, which were downloaded from Yahoo Finance. Descriptive statistics for the three original series, each comprising 2518 observations, are shown in Table 1. The statistics show that the three series display characteristics that are typical of financial return series, displaying skewness and excess kurtosis. Plots of the original sample of daily adjusted returns are shown in Figure 1. The QQplots of the series shown in Figure 2 reveal that they have fat tails and do not conform to a normal distribution.
The original three series are stationary, as confirmed by the Augmented Dickey-Fuller tests using constant and trend, and also display significant ARCH effects. The results of these tests are shown in Table 2.

Univariate Conditional Volatility Models
Chang and McAleer [14] show that Full BEKK has no underlying stochastic process, regularity conditions, or asymptotic properties. They point out that, in the development of GARCH, the conditional mean of financial returns for commodity i, in a financial portfolio of m assets, can be developed as follows: y it = E(y it | I t−1 ) + ε it , 1 = 1, 2, ..., m. (1) In Equation (1) above, the returns, y it , represent the log difference of financial asset prices, I t−1 is the information set for all prices at time t − 1, E(y it | I t−1 ) is the conditional expected returns, and ε it is a conditionally heteroscedastic error term. The conditional volatility specifications are based on the stochastic specification presumed to underlie the return shocks, ε it . Chang and McAleer [14] consider the random coefficient autoregressive process underlying the returns shocks ε it , as shown below: where φ it ∼ iid(0, α i ), α i ≥ 0, η it ∼ iid(0ω i ), ω ki > 0, η it = ε kit / √ h it is the standardised residual, and h it is the conditional volatility of asset i. Tsay [15] suggested the following formulation for the conditional volatility of asset i as an ARCH process: A lagged dependent variable, h it−1 , is typically added to Equation (3) to improve the empirical fit: The specification in Equation (2) suggests that α i and ω i should be positive because they are the unconditional variances of two different stochastic processes. Equation (4) is a GARCH(1,1) model for asset i (see Bollerslev [2]). The stability condition requires that β i ∈ (−1, 1). Given that the stochastic process in Equation (2) follows a random coefficient autoregressive process, under normality (non-normality) of the random errors, the maximum likelihood estimators (quasi-maximum likelihood estimators, QMLE) of the parameters will be consistent and asymptotically normal.

Multivariate Conditional Volatility Models
The multivariate extension of the univariate ARCH and GARCH models is given in Baba et al. [7] and Engle and Kroner [8]. The relationship between the returns shocks and the standardised residuals, in the multivariate case, can be written as The multivariate extension of Equation (1) can remain unchanged by assuming that the three components are now m × 1 vectors, where m is the number of financial assets. Chang and McAleer [14] consider a vector random coefficient autoregressive process of order one as: where ε t and η t are In the case where A is a diagonal matrix, with a ii > 0 for all i = 1, ..., m, and | b ij |< 1| for all j = 1, ..., m, so that A has dimension m × m, and McAleer and Lieberman [9] showed that the multivariate extension of GARCH(1,1) from Equation (5) is given as the Diagonal BEKK model, namely: where A and B are both diagonal matrices. It is essential for the matrix multiplication of ε t−1 ε t−1 by A that A is diagonal and positive definite, given that the former is an m × m matrix. If this is not the case, Equation (6) could not be derived from the vector random coefficient autoregressive process in Equation (5). It was shown in McAleer and Lieberman [9] that the QMLE of the parameters of the DBEKK model are consistent and asymptotically normal, so that standard statistical inference for testing hypotheses is valid. However, Chang and McAleer [14] demonstrate that this is not the case for the Full BEKK model. They consider element i of Equation (5), which can be written as: which is not equivalent to Equation (2) unless φ ijt = 0, ∀ j = i. Chang and McAleer [14] point out that Equation (7) is not a random coefficient autoregressive process because of the presence of another m − 1 random coefficients. Furthermore, Equation (7) is not invertible because the random processes cannot be connected to the data, which requires m equations, as shown in Equation (5). This means that the stochastic process underlying univariate ARCH is not a special case of that underlying multivariate ARCH, unless φ ijt = 0, ∀ j = i.
As a result, Chang and McAleer [14] suggest that, the case of a Full BEKK model, namely where there are no restrictions on the off-diagonal elements in φ t , and hence no restrictions in the off-diagonal elements in A, is not possible if univariate ARCH is to be a special case of its multivariate counterpart, Full BEKK. This suggests that Full BEKK does not exist, except by assumption.
Given the above result, plus the fact that Full BEKK is frequently estimated in practice and is incorporated in many commercial econometric statistical packages, the focus in this paper is to explore whether there is any evidence of bias in the coefficients estimated in Full BEKK, as compared with DBEKK. We use DBEKK as a benchmark because the mathematical and statistical conditions of DBEKK have been established.
We conduct simulations generating simulated financial return series and use them as inputs to estimate both DBEKK and Full BEKK, from which we can compare the estimates of the conditional variances and covariances. The null hypothesis is that the two sets of estimates should not differ systematically. The method of generating the simulated financial return series is discussed below.

Simulated Return Series
We use the three financial return series for Google, IBM, and Microsoft, and draw on code from several packages in the R library to randomly sample the original time series in blocks of five lags to ensure that we retain the autocorrelation structures to maintain the presence of ARCH effects. The timeSeries, boot and meboot packages prove to be of interest.
Plots of the simulated series based on the three stocks are shown in Figure 3, and the descriptive statistics of the simulated financial return series are shown in Table 3. The simulations have similar characteristics to the base series. IBM has the lowest excess kurtosis of the set of simulations, as would be expected, given that the IBM original return series had relatively low kurtosis in relation to the other two series. Augmented Dickey-Fuller tests confirmed that all the simulated series are stationary, and all displayed highly significant ARCH effects.
The simulated series have fat tails and are not Gaussian, as the QQplots for the first set of simulations of the three series shown in Figure 4 reveal.

Tests and Optimisation Prodedures
Empirical estimation of the DBEKK and FullBEKK models was based on the Estima Rats econometric software. In the empirical analysis which follows, we report the results of fitting the multivariate GARCH models, DBEKK and FullBEKK, to the simulated financial return series. This type of estimation procedure involves seeking the solution to an unconstrained minimization problem: where x are the data and θ is a vector of models to be chosen to minimise the objective function f (x, θ).
In the case of GARCH models, this will be a negative log-likelihood function. Typically, no closed-form expression for f (x, θ) is available or for its partial derivatives, and so the solution minimisation process is usually achieved by the use of numerical methods. Hurn et al. [16] point out that there are two broad approaches to the construction of a minimisition algorithm: methods that rely on function values, or algorithms that use the derivatives of the function.
If we use an algorithm based on gradients to minimise f (x, θ), then we are assuming that all first and second derivatives exist. The gradient vector, G(θ), and the Hessian matrix, H(θ), of the function, f (x, θ), can be defined as: The minima of the objective function occur at parameter values where the gradient is zero and the associated Hessian matrix is positive definite. The estimator,θ, of the parameter vector, θ, should satisfy the condition: We can start with a guess,θ k , which is assumed to be near the optimal value θ at which a minumum is attained. A Taylor series expansion of G(θ) aboutθ k is given as: If we replace G(θ) in Equation (10) by the previous expression (9), and ignore all terms of order two and above, it follows that:θ Using this result, the next guess forθ is: This is referred to as a 'full Newton step' which would be taken close to the location of the minimum. However, further away from the minimum, this step may not be guaranteed to reduce the value of the function, so the usual convention is to use a smaller step: where α k is chosen to control the step size and to ensure that the function is reduced at each iteration. All gradient-based algorithms employ the general iterative scheme set out in Equation (12), and differ only in their approximation of the Hessian matrix at each iteration. T.M. Christensen and Lindsay [16] point out that a Newton-Raphson procedure computes the Hessian matrix directly, the Method of Scoring uses the Information matrix (negative of the value of the Hessian matrix), and the BHHH algorithm, (Berndt and Hausman [17]) approximates the Hessian by the outer product of the gradient vector. BHHH provides an approximation of the Hessian matrix that is guaranteed to be positive definite, and so is a popular choice in many econometric packages.
The estimation process used in this paper is BHHH, but there are several caveats. T.M. Christensen and Lindsay [16] point out that the treatment of constraints on parameters, choice of starting values, specification of termination criteria and analytical versus numerical gradients can materially alter the final output of a minimisation algorithm. A subsequent comment on this paper by McCullough [18] suggests that default options for a nonlinear solver are not likely to produce a correct answer, and that the answer produced by a nonlinear solver is not necessarily correct.
McCullough and Vinod [19] question the baseline accuracy of many commonly used econometric packages, and note that there is often a trade-off between computational speed and statistical accuracy. McCullough and Renfro [20] explore the interaction between benchmarks, software standards, and econometric theory, using the GARCH model as a case study, and caution against the uncritical use of standard econometric packages.
Despite these various issues, the paper adopts a consistent framework in the estimation methods adopted to compare DBEKK with Full BEKK. We rely upon the estimation procedures in Rats and the BHHH algorithm to fit the two models. These are then used to predict the daily condition variances and covariances generated by the two models. We then use quantile regression to analyse the two sets of estimates using the DBEKK estimates as the baseline benchmark.

Quantile Regression
Quantile Regression is modelled as an extension of classical OLS (Koenker and Bassett [21]). In Quantile Regression, the conditional mean as estimated by OLS is extended to similar estimation of an ensemble of models of various conditional quantile functions for a data distribution. Therefore, Quantile Regression can better quantify the conditional distribution of (Y|X). The central special case is the median regression estimator that minimises a sum of absolute errors. Estimates of the remaining conditional quantile functions are obtained by minimizing an asymmetrically weighted sum of absolute errors, where the weights are functions of the quantile of interest. This makes Quantile Regression a robust technique, even in the presence of outliers. Taken together, the ensemble of estimated conditional quantile functions of (Y|X) offers a much more complete view of the effect of covariates on the location, scale and shape of the distribution of the response variable.
For parametric estimation in Quantile Regression, quantiles as proposed by Koenker and Bassett [21] can be defined through an optimisation problem. In order to solve an OLS regression problem, a sample mean is defined as the solution of the problem of minimising the sum of squared residuals, in the same way the median quantile (0.5%) in Quantile Regression is defined through minimising the sum of absolute residuals. The symmetrical piecewise linear absolute value function assures the same number of observations above and below the median of the distribution.
The other quantile values can be obtained by minimizing a sum of asymmetrically weighted absolute residuals, giving different weights to positive and negative residuals. Solving the following: where ρ τ () is the tilted absolute value function, gives the τth sample quantile with a solution. Taking the directional derivatives of the objective function with respect to ξ (from left to right) shows that this problem yields the sample quantile as its solution.
After defining the unconditional quantiles as an optimisation problem, it is easy to define the conditional quantiles similarly. Taking the least squares regression model as a base to proceed, for a random sample, y 1 , y 2 , . . . , y n , solving: gives the sample mean, as an estimate of the unconditional population mean, EY. Replacing the scalar, µ, by a parametric function µ(x, β), and then solving: gives an estimate of the conditional expectation function, E(Y|x). Proceeding in the same way for Quantile Regression, in order to obtain an estimate of the conditional median function, the scalar, ξ, in the first equation is replaced by the parametric function, ξ(x t , β), and τ is set to 1/2 . The estimates of the other conditional quantile functions are obtained by replacing absolute values by ρ τ (), and solving: The resulting minimization problem, when ξ(x, β) is formulated as a linear function of parameters, can be solved efficiently by linear programming methods. We use quantile regression to compare the relative behaviour of the conditional variances across the quantiles, as predicted by the two models.

Empirical Results
We estimated both DBEKK and Full BEKK using the simulated financial return series. The estimates from DBEKK are used as the benchmark, given that it has established statistical regularity conditions. We decided to keep the comparison tests as simple as possible and first estimated a two-variable version of the DBEKK and Full BEKK models using the the three sets of simulated return series in pairs. This was then followed by a single three-variable set of estimates, in order to verify that the same pattern of results exists. The null hypothesis is that Diagonal and Full BEKK are equivalent when the off-diagonal coefficients in Full BEKK are zero, so the asymptotic tests are statistically valid. We proceeded by estimating the coefficients for the conditional variances and the conditional covariances for the two models, and then used non-parametric sign tests on the differences between the two sets of estimates.
The estimates of the constants, ARCH effects and conditional variances for the two models are shown in Table 4. DBEKK and Full BEKK fitted to the pairs of simulated series were highly significant, and all but three pairs of the fifty-four coefficients estimated in the models, and presented in Table 4, were significant at the 1% level. (The insignificant coefficients are marked with an asterisk ( *) in Table 4.) The coefficients of the conditional covariances are shown in Table 5. The majority of these estimates are insignificant, so we concentrated our analysis on the conditional variances.
We then undertook a set of non-parametric sign tests on the values of the estimated coefficients, reported in Table 4. We ran the tests in a number of different formats, both on the full set of coefficients reported in Table 4, and the full set minus the three pairs of insignificant estimates. The results of the sign tests are reported in Tables 6 and 7, which suggest that there are no significant differences in the values of the coefficients for the constants, ARCH effects and the conditional variances estimated for the two variables. However, these tests treat the coefficients in isolation, and regard them as being independent, which is not the case when they are combined into a DBEKK or Full BEKK model.   Expected value = 612 Variance = 10,106 Z = 0.0547108 P (Z < 0.0547108) = 0.478184 Two-tailed p-value = 0.956369 DBEKK and Full BEKK are multivariate GARCH models that are used for forecasting conditional volatility. The crucial issue for purposes of risk management is how the forecasts of conditional volatility derived from the two models compare. These are vital components for assessing risk, and might be used to compute the Value at Risk (VaR) of a portfolio of financial assets, for example.
The simulated financial return samples for the nine variables contain ten years of daily data, or 2581 data points. We filter these through the DBEKK and Full BEKK models, and obtain corresponding estimates of the conditional variance projections, for each simulated security, from the two models. These forecasts of the conditional variances are then compared using non-parametric sign tests. The results for each simulated security are shown in Table 8.
The sign tests in Table 8 are based on the null hypothesis that the median difference in the conditional variances produced by the two models, DBEKK and Full BEKK, for the simulated securities, is zero. The null hypothesis is strongly rejected in all cases, and the differences are highly significant. We also ran sign tests, not reported, based on the null hypothesis that there was no difference in the conditional variances predicted by the two models. These results also strongly rejected the null hypothesis in all cases.  While it is valuable to know that the two models produce different predictions of the conditional variances, it is also of interest to know whether or not there are systematic differences in the predictions of the conditional variances. We explore this issue by means of quantile regression. The advantage of quantile regression is that we can explore the relationship between the two sets of predictions from DBEKK and Full BEKK at particular quantiles. We regress the predicted conditional variances from the Full BEKK model on the corresponding predictions from DBEKK for each of the simulated securities, in the pairs of securities modelled. We treat the predictions of conditional variances from the Full BEKK model as the dependent variable. The results of these quantile regressions are shown in Table 9. Notes: All the slope coefficients across the quantiles estimated using robust errors are significant at the 1% level. 1 Problems with convergence encountered in the Full BEKK model. 2 Estimation of the Full BEKK model in this case failed to converge using BHHH, so we switched to BFGS. Table 9 reveals a distinct pattern of an increase in the slope coefficients as we move across the quantiles from the lowest 0.05 quantile to the highest 0.95 quantile. The most extreme case is the prediction of the conditional variances for the relationship between IBMV2 and MSV2. In the 0.05 quantile, the conditional variance prediction by Full BEKK is 50 times lower than for DBEKK, and, in the 0.095 quantile, it is 18 times higher (though there were convergence issues encountered in the Full BEKK estimation in this case). Even so, the difference across these two extreme quantiles usually varies by between 10 and 20 percent. This is still very large if we intend to use the models to predict a portfolio VAR.
If we use the predictions of DBEKK as the benchmark, then application of the Full BEKK model to the same data set may underestimate risk in the lowest quantile and overestimate risk in the highest quantile. In the eighteen examples, seven of the total estimates suggest risk in the 0.05 quantile will be lower by 10 percent or more when estimated by Full BEKK, as opposed to DBEKK. Similarly, in nine of the total cases, the estimate of risk in the 0.095 quantile is 10 percent or more, when estimated by Full BEKK, as compared with DBEKK. Thus, there are considerable discrepancies in the predictions of conditional volatility based on these applications of the two models.
These discrepancies in the regression slope coefficients are apparent in Figures 5 and 6, which present graphs of the estimated slope coefficients across the quantiles for the pairs of simulated securities considered. The quantile regression bounds estimated at the 0.95 level are shown around the quantile slope estimates in each figure. The horizontal lines, in the centre of the figures, show the ordinary least squares regression slopes for the regressions of conditional variances for each security, regressed on the conditional variances for the same security, when considered in the same pairwise estimates produced by DBEKK. The ordinary least squares slope coefficients are not very informative, and merely suggest whether the predicted conditional variances from Full BEKK are relatively above or below those from DBEKK. There is considerable variation in the figures, but most of them are slightly below one.
The quantile regression analyses are much more informative. The lines in Figures 5 and 6 link the slope coefficients estimated at the 0.05, 0.25, 0.50, 0.75, and 0.95 quantiles, when the predictions of the conditional variances from Full BEKK are regressed on those from DBEKK ( Figure 6). In all cases, except for one, shown in Figures 5 and 6, the estimates at the lowest 0.05 quantile reveal a relationship between the two sets of estimates that is markedly different from that suggested by ordinary least squares, which captures the average relationship. The relationship is markedly different, at this quantile, frequently by ten to twenty percent.
Another startling feature is that all the slopes depicted in Figures 5 and 6 are strongly positive, in that the estimated slope coefficients all increase, with one exception, from the lowest to the highest quantile. Thus, the conditional variances estimated from the Full BEKK model, are much higher, at the 0.95 quantile, often by 20 percent or more, than the conditional variance estimated by the DBEKK model.
These results have strong implications if we try use the two multivariate models to estimate portfolio risk. The analysis reported in the paper, on these simulated financial return series, suggests that the use of the Full BEKK model will underestimate conditional variances in the left-hand tail of the portfolio return distribution, relative to DBEKK, and overestimate it in the right-hand tail of the distribution.
These results are subject to certain caveats. We have estimated the models using the Estima Rats econometric package, and used the default settings when fitting the models. We have not changed any of the tolerances in the algorithms used to fit the models, or changed the settings for the initialization of the algorithms used to commence the models. We have also instructed the program to use the BHHH optimization procedure to fit the models. All the models have been fitted using a Gaussian distribution, and the estimates would be different if we used a t-distribution. (We also did some analysis using the t-distribution, which is not reported in the paper, that revealed a virtually identical pattern of relationships across the quantiles to those that that are reported in the paper). The intention was to use a consistent approach to the fitting of the models and then to explore the consistency of the results. We also estimated DBEKK and Full BEKK using three variables jointly, in this case, GOOGV1, IBMV1, and MSV1, just to check that similar behaviour was displayed when we employed a trivariate estimation procedure. The results are shown in Table 10.
It is evident in Table 10 that many of the additional terms included in the Full BEKK model are not statistically significant, at least in this simulated data set. We also ran a quantile regression analysis of the conditional variances produced by Full BEKK, regressed on the conditional variances producd by DBEKK, using these three securities. The results are shown in Table 11. Plots of the quantile regression slope coefficients are shown in Figure 7.   It can be seen in Table 11 and in Figure 7, respectively, that exactly the same pattern of results emerges, when we estimate DBEKK and Full BEKK on three securities jointly, as previously the case with pairs of securities. The conditional variance estimates for Full BEKK, relative to DBEKK, are comparatively lower in the 0.05 quantile, increase across the quantiles, and are relatively higher in the 0.95 quantile. If the estimates were the same, the slope coefficient would be unity.
We used the DBEKK model as a benchmark, given that the statistical properties of this model have been established. The results, using this benchmark, suggest that there is an observable and relative bias in the predictions of the conditional variances from the Full BEKK model. This has serious practical implications about the use of Full BEKK for risk management and modelling purposes.

Conclusions
This paper explored the relative biases in the estimation of the Full BEKK model, as compared with the Diagonal BEKK model, which is used as an empirical benchmark. Chang and McAleer [14] showed that univariate GARCH is not a special case of multivariate GARCH, specifically the Full BEKK model, and demonstrate that Full BEKK, which, in practice, is estimated almost exclusively in the literature, has no underlying stochastic process, regularity conditions, or asymptotic properties. Diagonal BEKK (DBEKK) does not suffer from these limitations, and hence provides a suitable benchmark.
We used simulated financial returns series to contrast the estimates of the conditional variances from DBEKK and Full BEKK. The results of thr non-parametric tests on their values shows evidence of considerable bias in the Full BEKK estimates relative to those of DBEKK. The results of quantile regression analysis showed there was a systematic relationship between the two sets of estimates as we moved across the quantiles. Estimates of conditional variances from Full BEKK, relative to those from DBEKK, are relatively lower in the left tail and higher in the right tail. The phenomenon appears to be all-pervasive in estimates reported in the simulated financial return series. This result has serious practical implications for the use of Full BEKK as a risk management tool.
In the introduction, we cited studies that employ the BEKK model to explore volatility transmission, and financial market linkages, to name a few applications. It is important that investors and financial policy makers are fully aware of the limitations of the full BEKK model, before they seek to apply the results of any analysis incorporating its findings to real world issues.