Testing for A Set of Linear Restrictions in VARMA Models Using Autoregressive Metric : An Application to Granger Causality Test

In this paper we propose a test for a set of linear restrictions in a Vector Autoregressive Moving Average (VARMA) model. This test is based on the autoregressive metric, a notion of distance between two univariate ARMA models, M0 and M1, introduced by Piccolo in 1990. In particular, we show that this set of linear restrictions is equivalent to a null distance d(M0,M1) between two given ARMA models. This result provides the logical basis for using d(M0,M1) = 0 as a null hypothesis in our test. Some Monte Carlo evidence about the finite sample behavior of our testing procedure is provided and two empirical examples are presented.


Introduction
In this paper, we investigate the relationship between a set of linear restrictions on the parameters of a Vector Autoregressive Moving Average (VARMA) model (see [1]) and the autoregressive metric (AR-metric hereafter), a notion of the distance between two univariate ARMA models introduced by Piccolo [2].In particular, we show that these linear restrictions are satisfied if and only if the distance d between the two given ARMA models (say M 0 and M 1 ) is zero.This result provides the logical basis for using d(M 0 , M 1 ) = 0 as null hypothesis for testing this set of restrictions.Moreover, we show that the set of linear restrictions considered is sufficient for the condition of Granger noncausality ( [3]), while in the VAR framework it becomes also a necessary condition (see [4]).This theoretical result allows the implementation of an inferential procedure and a bootstrap algorithm.Our procedure is verified by some Monte Carlo experiments also in a quite small sample.The paper is organized as follows.Section 2 introduces the notion of the distance between ARMA models and specifies the relationship between the AR metric and the set of linear restrictions considered for a VARMA model.Section 3 presents the inferential implication.Section 4 provides some Monte Carlo evidence about the finite sample behavior of our testing procedure.Section 5 contains two empirical illustrations.Section 6 gives some concluding remarks.

Linear Restrictions in a VARMA Model and AR-Metric
Let z t be a zero mean invertible ARMA model defined as φ(L)z t = θ(L) t where φ(L) and θ(L) are polynomials in the lag operator L, with no common factors, and t is a white noise process with constant variance σ 2 .It is well-known that this process admits the following representation: π(L)z t = t where the AR(∞) operator is defined by Let be the class of ARMA invertible models.If X ∈ and Y ∈ , following Piccolo [2], the AR-metric is defined as the Euclidean distance between the corresponding π-weights sequence, {π j }, (1) The AR-metric d has been widely used in time series analysis (see, e.g., [5][6][7][8][9][10]).We observe that Equation ( 1) is a well-defined measure because of the absolute convergence of the π-weights sequences.Now, we consider the following VARMA model of order p, q, for an n × 1 vector time series {w t ; t ∈ Z}: where are two n × n matrices of polynomials in the lag operator L, and t is an n × 1 vector white noise process with positive definite covariance matrix Σ.We assume that det (A(z)) = 0 for |z| < 1.This condition allows non-stationarity for the series, in the sense that the characteristic polynomial of the VARMA model described by the equation det (A(z)) = 0 may have roots on the unit circle.Condition det (A(z)) = 0 for |z| < 1, however, excludes explicitly explosive processes from our consideration.We further assume that the model Equation ( 2) satisfies the usual identifiability conditions.If B(L) = I, we obtain a pure vector autoregressive (VAR) model of order p.If A(L) = I, we obtain a pure vector moving average (VMA) model of order q.Consider the partition w t = (x t , y t ) where x t is a scalar time series and y t is an (n − 1) × 1 vector of time series.Accordingly, the model Equation ( 2) for the partition of w t can be rewritten as: ij L h i, j = 1, 2 are matrix polynomials in the lag operator L, with det(A 22 (L)) = 0.In this framework it is well-known (see, for example, [11]) that y t does not Granger-cause x t if and only if and that a sufficient condition for Equation (4) to hold is We note that if the condition Equation (5) holds then x t follows a univariate ARMA model given by: The main aim of this paper is to establish the implications of the set of linear restrictions Equation (5), using the notion of the distance between ARMA models measured by Equation (1).In particular, we will consider the distance between the ARMA(p, q) model Equation ( 6) (denoted M 0 ) and the ARMA model for the subprocess {x t ; t ∈ Z} implied by the VARMA(p, q) model Equation (2) (denoted M 1 ).
Following Lütkepohl [1], the implied ARMA model M 1 can be obtained as follows.Premultiplying both sides of Equation ( 2) by the adjoint of A(L), denoted as Adj (A(L)), we obtain We note that each component of Adj (A(L)) B(L) t is a sum of finite order MA processes, thus it is a finite order MA process (see Proposition 11.1 in [1]).Hence, the subprocess {x t ; t ∈ Z} follows an ARMA model given by: det (A(L)) where u t is univariate white noise and δ(L) is an invertible polynomial in the lag operator L.More precisely, δ(L) and u t are such that where C 1 (L) denotes the first row of the matrix C(L) = Adj (A(L)) B(L).Finally, we observe that x t has also the following autoregressive representation of infinite order:

Theoretical Results
We consider the distance according to Equation (1) between the model Equations ( 6) and (8) M 0 and M 1 : where The following proposition provides a necessary and sufficient condition for the set of linear restrictions Equation (5) in terms of the distance d(M 0 , M 1 ).
and the first row the matrix and Thus we have that u t = xt (where this equality between random variables means equality with probability 1) and and hence d(M 0 , M 1 ) = 0.
(⇐) We have to show that if d(M 0 , M 1 ) = 0, then A 12 (L) = B 12 (L) = 0. We may have two cases: On the other hand, we have and hence Using the Schur's formula, we get Thus δ(L) assume the following expression Since the degree of polynomial δ(L) is finite it follows for Equation ( 10) that it must be Since by hypothesis A 21 (L) = 0, it follows that A 12 (L) = 0 and this in turn implies that On the other hand δ(L) is such that and hence yt (11) where this equality is with probability 1.Since u t is a white noise, Equation (11) implies that B 12 (L) = 0.
Second case: and the first row of the matrix C(L) is given by where and hence The following equality then occurs with probability 1: yt Since u t is a white noise, this implies that A 12 (L) = 0 and B 12 (L) = 0.
We have also the following corollaries.Corollary 2. Let w t = (x t , y t ) be a pure VMA(q) process.y does not Granger-cause x if and only if d(M 0 , M 1 ) = 0.
Proof of Corollary 2. It is similar to the proof of Corollary 1.

Inferential Implications
Proposition 1 allows us to test the set of linear restrictions Equation ( 5) considering the null hypothesis H 0 : d(M 0 , M 1 ) = 0. Further, we observe that if the process {w t ; t ∈ Z} follows a VAR model, Corollary 1 establishes that the Granger noncausality from y t to x t is equivalent to the condition d(M 0 , M 1 ) = 0. Thus, in a VAR framework, we can test for Granger noncausality from y t to x t using the null hypothesis d(M 0 , M 1 ) = 0 without considering the nature of the involved variables.In fact, it is well-known that the use of non-stationary data in causality tests can yield spurious causality results (see, e.g., [12]).Thus, before testing for Granger causality, it is important to establish the properties of the time series involved because different model strategies must be adopted when: the series are I(0), the series are partly I(0) and partly I(1), the series are determined I(1) but not cointegrated, or the series are cointegrated.Of course, the weakness of this strategy is that incorrect conclusions drawn from preliminary analysis might be carried over into the causality tests.In the VAR framework an alternative method is the so-called lag-augmented Wald test (see [13,14]), which is a modified Wald test that requires the knowledge of the maximum order of integration of the involved variables.In this way, the proposed test based on the AR-metric can be a valid alternative for a Granger noncausality test (see [4]), since it does not require the exact knowledge of the series properties or the knowledge of the maximum order of integration.
To conduct inference on the basis of Proposition 1, we need an asymptotic distribution for d(M 0 , M 1 ).In the class of ARMA processes, the asymptotic distribution of the maximum likelihood estimator d2 has been studied, among others, in [5,15].In this case, for two independent ARMA(p, q) processes X and Y , under the null hypothesis d(X, Y ) = 0, the maximum likelihood estimator d2 has the following asymptotic distribution: where χ 2 gj are independent χ 2 -distributions with g j degrees of freedom, λ j are the eigenvalues of the covariance matrix of ( φxi − φyi ) and K < p + q.The evaluation of this distribution can be cumbersome; hence approximations, as well as evaluation algorithms, have been proposed (see [15]).Anyhow, in our framework, the ARMA models implied by Equation (6) and by the VARMA model Equation ( 8) under the null hypothesis A 12 (L) = B 12 (L) = 0 are equal, so they cannot be considered independent.Then, to conduct the inferential procedures, we suggest the bootstrap algorithm proposed by Di Iorio and Triacca [4], which is described in the next section.

The Bootstrap Test Procedure
For an easy illustration of our bootstrap procedure, let us consider a bivariate VARMA(p, q) model simply denoted as A(L)w t = B(L) t where w t = (x t , y t ) , t = ( xt , yt ) with covariance matrix Σ and, based on Proposition 1, we want to test the null hypothesis H 0 : A 12 (L) = B 12 (L) = 0 using H 0 : d(M 0 , M 1 ) = 0 1.Estimate on the observed data the VARMA(p, q) and obtain Â(L), B(L), Σ and the residuals ˆ t ; 2. using the estimated parameters from Step 1, obtain the univariate ARMA implied by the estimated VARMA for the subprocess x t ; 3. evaluate the AR(∞) representation truncated at some suitable lag p 1 of the ARMA model in Step 2 (model M 1 ); 4. estimate for x t , using the observed data, an ARMA(p, q) model under the null hypothesis H 0 : A 12 (L) = B 12 (L) = 0 and evaluate its AR(∞) representation truncated at some suitable lag p 0 (model M 0 ); 5. evaluate the distance d(M 0 , M 1 ) between the AR(p 0 ) and the AR(p 1 ) obtained in Steps 3 and 4; 6. estimate the VARMA(p, q) model under the null hypothesis H 0 : A 12 (L) = B 12 (L) = 0 to obtain the estimates Ã(L), B(L) and Σ; 7. apply bootstrap to the re-centered residuals ˆ t and obtain the pseudo-residuals When this procedure is applied, two remarks concerning the pseudo-data generation and the modeling of the dependency across the subprocess are in order.Firstly, in a well-specified model framework (as well as during a simulation exercise), the estimated residuals ˆ t do not show any autocorrelation structure, so we do not need any particular resampling scheme for dependent data to obtain pseudo-error terms * t , and we can then apply a simple resampling procedure.Besides, for empirical studies the pseudo-data can be obtained considering several resampling strategies, as a block bootstrap algorithm (see [16]).Secondly, in order to reproduce the dependency across the subprocess expressed by Σ in the pseudo-data, we simply have to apply the resampling algorithm to the entire T ×n matrix of the estimated residuals ˆ t .

Monte Carlo Experiments
The performance of the proposed inferential strategy can be investigated by means of a set of Monte Carlo experiments.In particular, we consider the test for the set of linear restriction associated to a Granger noncausality test for two different DGP: a stable bivariate VARMA(1, 1) model and a cointegrated bivariate VAR(2) model.Our test will be compared with the performance of a Wald test for the VARMA(1, 1) and with the lag-augmented Wald test suggested by Dolado et al. and Toda et al. [13,14] for the cointegrated VAR model.
In our study, the tests of the null hypothesis were carried out using nominal significance levels of 1%, 5% and 10%.To analyze the power of the test, we consider the two cases below to verify how the test reacts when the parameter values move away from zero: It is well-known that a maximum likelihood estimation of a VARMA model can be a challenging task (see, e.g., [1,17]).For this reason we consider sample size T = 100 and T = 200, which are quite large compared with what is usually found in empirical applications.Taking into account the dimension of our exercise, we perform the maximum likelihood estimation using the Kalman filter procedure implemented in Gretl (ver.1.9.14) (see [18]).Therefore, due to computational time involved by the maximum likelihood estimation of the VARMA model, the experiments are based on 400 Monte Carlo replications and 400 bootstrap redrawings.We compare our results with the usual Wald test using, for a proper comparison, also the bootstrap p-values obtained by the same bootstrap algorithm described above.Finally, we verify by some preliminary experiments that a suitable value for p 0 and p 1 in Steps 3 and 4 in the bootstrap algorithm is 15.The results are reported in Table 1.As we can see from Table 1, the size for the AR-metric test is quite satisfactory, and the power increases with growing sample size and as the true parameter values move away from zero.In any case, as expected, the difficulties of the maximum likelihood estimation for the VARMA model affect the distance more than the Wald test, which shows a better power.In fact, as the bootstrap algorithm underlines, the distance-based test is built on the autocovariances obtained by the estimated values of the parameters.Hence, its performances are heavily dependent on the quality of these estimates.
As before, the tests of the null hypothesis were carried out using nominal significance levels of 1%, 5% and 10%.To analyze the power of the test, we consider again the two cases below: In this case the parameter estimation is easier.To make our Monte Carlo experiment more relevant for actual empirical applications, we consider sample size T = 50, a medium size in terms of annual data but small size for a quarterly frequency, and T = 100, which is a time span large in terms of annual data but pretty common for quarterly data.Now we compare the performances for our test with the lag-augmented Wald test proposed by Dolado et al. and Toda et al. [13,14] in this framework.The lag-augmented Wald test has an asymptotic χ 2 -distribution with p degrees of freedom when a VAR(p + d max ) is estimated, where d max is the maximal order of integration for the series in the system.However, it is well-known that the lag-augmented Wald test based on asymptotic critical values may suffer from size distortion and low power especially for small samples [19,20].Thus, to overcome this problem, we apply the same bootstrap algorithm described above using the Wald test from an augmented VAR(2 + a max ), with augmentation order a max = 1, and we evaluate the bootstrap p-values.
For this DGP the experiment is based on 1000 Monte Carlo replications and 1000 Bootstrap redrawings, and, as before, in Step 3 we set p 0 = 15.The results are collected in Table 2.We note that, for a nominal significance level of 5%, our results are rather similar to those of the second part of Table 3 reported in Shukur and Mantalos [21].The comparison of the power estimates for our test and the lag-augmented Wald test of Toda et al. [14] shows that our test has relatively high power properties in all situations, while the size is very close to the nominal values for both tests.

Empirical Applications
In this section we present two empirical examples to illustrate the application of the test suggested in the paper.First, we consider a VAR model and in particular we examine the causal relationship between the log of real per capita income and the inflation.Then, we consider a VARMA example based on the SCC dataset discussed in [22].
To take into account any possible dependence structure in the residuals of the estimated models, we use the Stationary Bootstrap ( [23]) as resampling algorithm.The Stationary Bootstrap is a block bootstrap scheme where the resampled pseudo-series are stationary; this scheme chains blocks of observations of the original series starting at random locations, and the length of each block is randomly chosen from a geometric distribution.Following Palm et al. [24], the mean block length can be computed as a function of the length of the time sample; by some exploratory simulations we verify the robustness of the tests to different block sizes, so we report results for blocks 1.75 √ T 3 .To discuss the possible causal relationship between the log of real per capita income (y) and inflation (∆p) we re-examined the dataset used by Ericsson et al. [25] The computed d(M 0 , M 1 )-statistic is equal to 0.35 with a bootstrap p-value 0. This result indicates the presence of Granger causality from output to inflation.This finding is in accordance with the results of Ericsson et al. [25].The same result is obtained using the lag-augmented Wald test.
The SCC dataset discussed by Tiao and Box [22] considers the quarterly time series of the U.K. Financial Time Ordinary Share Index, the U.K. Car Production and the U.K. Financial Time Commodity Price from the III Quarter 1952 to the IV Quarter 1967.The goal is verify the possibility of predicting the first variable from the lagged values of the last two.According to Tiao and Box [22], a VARMA(1, 1) is the best model for this data, then a null hypothesis following Equation (5) will be the inferential base to test just a sufficient condition on the predictability hypothesis.The VARMA(1, 1) maximum likelihood parameter estimates using the Kalman filter procedure implemented in Gretl (ver.1.9.9) are the following (standard errors in bracket): The estimates are quite similar to the values reported as "full model" in the Table 10 in [22], taking into account the difference in the estimation algorithm and software.The computed d(M 0, M 1 )-statistic is equal to 4.58 with a bootstrap p-value 0.225, evaluated on 500 bootstrap replications, and this finding is in accordance with the results of "final model" in the Table 10 in Tiao and Box [22].We perform also a Wald test on the same null hypothesis, the value is 36.684,which asymptotically rejects the null, but with a bootstrap p-value 0.146 that sustains the results of our test.

Conclusions
In this paper we characterized a set of linear restrictions in a Vector Autoregressive Moving Average (VARMA) model in terms of the notion of distance between ARMA models and we have derived a new inferential procedure.In particular, this new procedure can be useful for a new Granger noncausality test in a VAR framework.The advantage of this test is that it can be carried out irrespective of whether the variables involved are stationary or not and regardless of the existence of a cointegrating relationship among them.Our inferential procedure has been validated by a set of Monte Carlo experiments.In a VARMA framework this procedure shows encouraging results even if a deeper investigation, made complex by the computational time, is needed.In a cointegrated VAR framework our method for detecting causality has provided better results, as the conducted simulation study has shown that our test exhibits good performance in terms of size and power properties, even in small samples.Finally, we have shown that this test can be usefully applied in practical situations to test causality between economic time series.
, repeat Steps 1-5 to obtain the bootstrap estimate of the distance d * (M 0 , M 1 ); 10. repeat Steps 7-9 for b times; 11. evaluate the bootstrap p-value as the proportion of the b estimated bootstrap distance d * that exceeds the same statistic evaluated on the observed data d, that is, pval b
. The dataset refers to United States over the period 1953-1992 and can be downloaded from the Journal of Applied Econometrics Data Archive.The VAR order selection is based on Bayesian Information Criterion and the following model is estimated.