Unit Root Tests: the Role of the Univariate Models Implied by Multivariate Time Series

In cointegration analysis, it is customary to test the hypothesis of unit roots separately for each single time series. In this note, we point out that this procedure may imply large size distortion of the unit root tests if the DGP is a VAR. It is well-known that univariate models implied by a VAR data generating process necessarily have a finite order MA component. This feature may explain why an MA component has often been found in univariate ARIMA models for economic time series. Thereby, it has important implications for unit root tests in univariate settings given the well-known size distortion of popular unit root test in the presence of a large negative coefficient in the MA component. In a small simulation experiment, considering several popular unit root tests and the ADF sieve bootstrap unit tests, we find that, besides the well known size distortion effect, there can be substantial differences in size distortion according to which univariate time series is tested for the presence of a unit root.


Introduction
It is well known that unit root tests may have large size distortion when the autoregressive parameter is close to unity and/or when there is a large MA component (see, for instance, [1]).The simulation evidence on the size distortion of some standard univariate unit root tests, such as the ADF test and Phillips-Perron Z α and Z t , is overwhelming (see, among others, [2]).Most simulation studies consider univariate unit root processes, but the same findings on the size distortion of unit root tests have been obtained by Reed [3,4] who considers a cointegrated VAR.He finds that the size distortion can be very large and not necessarily of similar magnitude across tests and univariate time series derived from the VAR.For instance, in the bivariate case, unit root tests applied to the one component may have an effective size as large as 90% while the same unit root test applied to the other component may have an effective size close to the nominal one.These finite sample results hold for DGPs characterized by quite different parameter values and for a wide range of roots of the AR component, and even in the case where the roots are 1 and 0.
In this note, we (a) provide a theoretical motivation for the finite sample size distortion observed in the presence of a large negative MA root; (b) give additional simulation evidence on its extent comparing standard and bootstrap unit root tests and (c) provide some suggestions for empirical researchers working with univariate time series implied by a VAR data generating process.

The Model
We start the discussion from the VAR (1) or, in compact notation, z t = Az t−1 + u t , where u t ∼ i.i.d.(0, V ).The representation for the univariate components, the so-called "final equations" in [5], can be obtained following [6] .Considering the lag polynomial matrix A(L) = I − AL, its determinant and the adjoint matrix the "final equations" for the VAR(1) model are given by It follows that the univariate processes evolve as an ARMA(2,1) model with a common AR component and two distinct MA components. 1  The magnitude and sign of the roots of the characteristic equation |A − λI| = 0 determine both the stationarity or nonstationarity of the univariate time series y t and x t and the existence of a cointegrating relationship between them.A necessary condition for cointegration is that the roots of the characteristic equation satisfy λ 1 = 1 and |λ 2 | < 1.From this unit root constraint, we obtain the restriction which can be used to obtain the VECM representation where guarantees the stationarity of the error correction mechanism.In fact, it is easy to show that 1 In general, considering a k-dimensional VAR(p) process the univariate models will be at most ARMA(kp, (k − 1)p), all univariate processes share the same AR component, and an MA component is present in each univariate model.See [5] for a general treatment of this issue.
Imposing the constraints λ 1 = 1 and |λ 2 | < 1, the "final equations", i.e., the univariate models become Thus, if the DGP is a bivariate VAR(1) with one unit root and one cointegrating relationship, the marginal processes for the level processes follow an ARIMA(1,1,1) model and the marginal processes for the first-difference stationary processes are ARMA(1,1) processes. 2It follows that the autocorrelation structure of the implied marginal processes, induced by the interaction of the AR and MA roots, is deemed to affect the finite sample size and power properties of unit root test in any simulation study where the DGP is a multivariate one.
Considering the right-hand side of (5), we see how the aggregate error term for each component of z t is the sum of an MA(1) process and a lagged white noise process It is easy to show that both aggregate error terms on the right-hand side have the autocorrelation function of an MA(1) process so that we can write where v t1 and v t2 are white noise processes.By setting the first-order autocorrelation coefficient of each marginal process on the left-hand side of (6) equal to the first-order autocorrelation coefficient of θ i (L)v ti , i.e., Cov(ξ ti , ξ t−1,i ) , we can find the moving average coefficients of the polynomials θ i (L) by choosing the invertible solution of the previous second degree equation. 3 For example, let us consider the DGP 4 in Table 1, i.e., A = 0.5 0.05 1 0.9 , V = 0.045 0.017 0.017 0.045 .
The roots of the characteristic equation of the reduced form VAR are given by 1 and −0.4 and the marginal processes are given by which, after some algebra, can be written as 2 See Cubadda et al. [7] for a general result on the implied univariate models from cointegrated VAR and Cubadda and Triacca [8] for the I(2) case.A general solution to this problem for an MA process of order q has been provided in Maravall and Mathis [9].
Thus, the MA component of the process ∆y t has a large negative root, which explains the enormous size distortion, as reported in many simulation studies since Schwert [1].

A Simulation Experiment
In a small simulation study, we assess the size distortion of a number of unit root tests when the data comes from a cointegrated DGP.We consider the classical ADF test by Said and Dickey [10], the Z α and Z t tests proposed by Phillips and Perron [11], the modified MZ α and MZ t by Stock [12] and Perron and Ng [2], the modified Sargan-Bhargava MSB test proposed by Stock [12], the point optimal test P T of Elliott et al. [13] and its modification MP T proposed by Ng and Perron [14], and, finally the DF-GLS test by Elliott et al. [13].We always estimate the spectral density at frequency zero of the error term using the autoregressive spectral density estimator as in Perron and Ng [2] and for the Z α , Z t , MZ α , MZ t , MSB tests we consider both OLS detrending and GLS detrending, as in Ng and Perron [14].
For the selection of the lag length, we do not follow a rule based just on the sample size but consider the Modified Akaike Information Criterion (MAIC) developed by Ng and Perron [14], where an upper bound to the lag length is set to the integer part of 12[(T + 1)/100] 1/4 .Given the better performance of this information criterion compared to the BIC one, we do not consider the latter in our simulations.Further, we consider the suggestion by Perron and Qu [15] and present results for MZ PQ α obtained using OLS detrending instead of GLS detrending in the MAIC.They show that this simple modification produces tests with effective size closer to the nominal size.
We also consider two bootstrap unit-root tests.Palm et al. [16] carried out extensive simulation experiments on the size and power of bootstrap unit root considering ADF sieve and block bootstrap tests, based on first difference series or on residuals.Their findings suggest that, both in terms of size and power, the ADF sieve test as in Chang and Park [17] or its residual-based version perform best.In the following, we shall consider these two versions of the ADF sieve unit root test and implement the tests following the procedure set forth in Palm et al. [16].
Finally, to be able to make a comparison with a widely used test for the presence of unit roots and cointegration in multivariate time series, we also consider Johansen's trace-statistics, say J tr , where, under the null, we should be able to reject no cointegration and not to reject the presence of one cointegrating vector.
In the simulation experiment, we consider two different ways of formulating the DGP.Firstly, as in Reed [3], we consider the VAR model in (1) subject to the constraints (3)-( 4) which guarantee cointegration between y t and x t .Secondly, we also consider a DGP widely used in cointegration analysis (see, among others, [18]): From this parameterization, we can obtain the implied VAR(1) as From the above VAR and VECM representations, we can obtain the DGPs considered in Reed [3] in terms of the parameters of ( 7) and vice versa.
In the simulation study, we consider DGPs parameterized, as in ( 7), following Gonzalo [18], and as in (1), following Reed [3].The two set of parameters are defined as follows: MC(a): the values taken by A and V in Reed [3], reported in Table 1 together with the implied values of ρ, β, the roots of the MA component in the univariate representations and the unconditional contemporaneous autocorrelation; MC(b): as in Gonzalo [18], we set c 2 = −1, β = 1 and c 1 = 1 and consider the following values for the remaining parameters: ρ = (0.9, 0.5), σ = (0.25, 1, 2) and η = (−0.5,0, 0.5), for a total of 18 experiments.The root of the common autoregressive component (one root is always equal to 1) and the coefficients of the distinct MA components of the univariate models implied by the multivariate DGP are reported in Table 2.
All results are based on a sample size T = 100, on 1000 simulations of the DGP and, for the bootstrap tests, on 500 bootstrap replications as in Psaradakis [19].
We notice that, for many values of the parameters, the univariate models are characterized by a large negative MA coefficient, which is exactly the circumstance in which unit root tests have low power and great size distortion even in moderately large sample sizes.For the set of parameters in MC(b), this always occurs when ρ = 0.9 and when ρ = 0.5 and η = −0.5, while for the set of parameters in MC(a), this occurs only in half of the cases.Some additional remarks are in order.First of all, MC(a) is able to generate, at least for the parameter values considered here, greater heterogeneity in the MA roots of the univariate first-difference processes than that generated by MC(b).In fact, in MC(a), we observe a large negative MA root for ∆y t associated to a small or a medium size negative MA root for ∆x t or a positive MA root for ∆y t associated to a negative one for ∆x t .On the contrary, in MC(b), looking, for instance, at the upper panel of Table 2 (the case in which ρ = 0.9), we may see how the coefficients of the MA components are almost identical for the two processes ∆y t and ∆x t and that, to a lower extent, the same applies to the lower panel (ρ = 0.5).When this occurs, the univariate representations of unit root processes resemble each other, and this may explain the similar behavior of unit root tests applied to y t and x t , separately.Furthermore, when ρ = 0.9, the coefficients of the MA component in ∆y t and ∆x t are not only very similar among themselves but also quite close to the root of the autoregressive component.This implies the presence of a near common factor in the univariate models for the first differenced series so that the AR and MA roots would almost cancel out.To our knowledge, this feature of the parameterization MC(b) used by (and many others Gonzalo [18]) has not been noticed so far in simulation studies on unit roots or cointegration tests.As a consequence of this near common root, the lagged unconditional correlation of ∆y t and ∆x t will tend to be small.For instance, for ρ = 0.9, σ = 2 and η = −0.5, the first-order unconditional autocorrelations of ∆y t and ∆x t are equal to 0.015 and −0.021, respectively, and the first-order unconditional cross-correlation is equal to −0.032.Similar results are obtained when ρ = 0.9 and for different values for σ and η.The first-order unconditional autocorrelations increase very slowly as ρ decreases, for instance, when ρ = 0.5, we have 0.05, −0.11 and −0.16 for the first-order unconditional autocorrelation of ∆y t , of ∆x t and the first-order unconditional cross-correlation, respectively.
The empirical size, at a 5% nominal level, for the unit root tests, is reported in Tables 3-6 for the estimated regression without a trend. 4For each DGP, we test for a unit root both in y t and in x t .For each set of parameters in MC(b), in each table, we report the effective size for a fixed value of the "signal-to-noise" ratio and different values of the remaining parameters; for the experiments in MC(a), we report the effective size for the four different parameterizations.
The first general and striking result, common to both parameterization MC(b) and MC(a), concerns the presence of important differences in the effective size according to whether y t or x t are tested for the presence of a unit root.Considering MC(b), the empirical size increases with η when testing for a unit root in y t and, on the other hand, it decreases when testing is carried out on x t .This finding is remarkable and unexpected since the univariate ARMA representations of y t and x t share the same AR component and have very similar MA components for most parameterizations.The differences in size can be fully appreciated when σ = 1 or σ = 2.For instance, in the case σ = 1, ρ = 0.9, η = −0.5, for most tests based on GLS detrending (but for the MZ PQ t ), the effective size is close to the nominal one when the unit root test is applied to y t , but it doubles or almost triples 4 For brevity, our discussion will refer to the model without trend, similar remarks apply when a trend component is included in the regression, simulation results in this case are available upon request from the authors.when unit root tests are applied to x t .In addition, the same applies to bootstrap unit root tests.The distortion in the effective size on y t and x t is reversed when σ = 1, ρ = 0.9 but η = 0.5 and to a lesser extent for smaller values of the AR root such as ρ = 0.5.Notice also that only in the case in which η = 0, do these differences tend to to be negligible.
For the parameterization in MC(a), in Table 6, we continue to observe substantial differences in the effective size according to whether y t or x t are being tested for a unit root.These differences are even more pronounced than those in Tables 3-5, perhaps because of the greater range and heterogeneity of the MA component obtained from the parameter values under MC(a).Furthermore, there are noticeable differences among tests and across DGPs: for instance, looking at DGP1, both the DF-GLS and bootstrap tests have reasonable effective sizes for x t while the effective size more than doubles for the bootstrap tests applied to y t , but it does not change substantially for the DF-GLS test.Again, for DGP4, the size of the DF-GLS more than doubles when y t is tested while the size of the bootstrap tests is more than four times larger for y t than for x t .
Considering the parameterization in MC(b), from Tables 3-5, we notice that when σ = 0.25, the empirical size is, in general, close to the nominal one for most tests and it is so, in particular, for the ADF sieve bootstrap unit root tests.In particular, the empirical size in both versions of the ADF boostrap test seem to be more stable and closer to the nominal size than the empirical size of the P T , MP T , and DF-GLS tests.However, for σ = 1, and to a larger extent when σ = 2, the empirical size of these tests tend to differ more and more from the nominal 5% level.In fact, the effective size increases with σ ranging in the interval (0.03, 0.09) when σ = 0.25 to the interval (0.11, 0.25) when σ = 2. Thus, as the variance of the random walk component in y t and x t increases, the size of the unit root tests increases, leading to greater size distortion, and the size distortion itself is quite sizeable for σ = 2, irrespective of ρ and η.In general, GLS detrending tends to increases the empirical size and this exerts a beneficial effect when σ is small, but, on the contrary, it is detrimental for the size when σ is large.Considering the four parameterizations under MC(a), the behavior of unit root tests is more heterogeneous, as is the pattern of AR and MA roots.Under OLS detrending, the Z α and Z t tests have large size distortion for all parameterizations.The modifications suggested by Perron and Ng [2] are somehow effective in reducing the distortion, but the behavior of the M tests is not stable across DGPs, and the same remark applies to the ADF test.Under GLS detrending, the DF-GLS test by Elliott et al. [13] has the best performance, showing an effective size very close to the nominal one in all cases but for y t in DGP2 and DGP4 where, in fact, the MA component has a coefficient close to −1.The bootstrap unit root tests have greater size distortion than the DF-GLS but, for y t in DGP2 and DGP4, exactly in those cases where the DF-GLS test does not perform well.
Finally, we consider consider Johansen's trace test under the null of one cointegrating vector.For parameters in MC(b), the test statistics are severely biased when ρ = 0.9, irrespective of the values taken by σ and η, while it has a size close to the nominal one for ρ = 0.05 and its performance, in the latter case, is superior to those of the standard and bootstrap unit root tests.For the parameterizations in MC(a), Johansen's trace test also displays a very good behavior since about 5% of the time, we reject no cointegration in favor of stationarity of the VAR in (1) in all cases.However, we should bear in mind that the AR root is rather small now and that, from the previous results based on MC(b), Johansen's test is adversely affected by large values of ρ.
We do not consider the power properties of the unit root tests considered here.Ng and Perron [14] provide simulation evidence that the DF-GLS has better power then the M tests, even though the latter have better size properties.For the bootstrap unit root tests, we take the results of the extensive simulation study by Palm et al. [16] who found that the ADF sieve bootstrap test performs better under a variety of DGPs with and without an MA component.An extensive simulation study on the power properties of the tests considered here for univariate time series generated by a cointegrated VAR is left for a further investigation.

Conclusions
A standard practice in cointegration analysis is to run unit root tests separately for each single time series in the multivariate system.However, univariate time series are most often observed as a part of a more general multivariate model.It turns out that the univariate models implied by a VAR data generating process always have a finite order MA component (e.g., see [6]).This feature may explain why an MA component has often been found in univariate ARMA models for economic time series and, given the well-known size distortion of popular unit root tests in the presence of a large negative coefficient in the MA component, it has important implications for unit root tests in univariate settings.In a small simulation experiment under cointegration, we find that (a) there can be substantial differences in the size distortion according to whether the unit root test is on y t or x t and that this occurs for the ADF sieve bootstrap unit root test too; (b) most tests perform well when the "signal-to-noise" ratio is small but the size distortion can be large when the "signal-to-noise" ratio increases; (c) the ADF sieve bootstrap unit root tests are not immune from size distortion; and (d) Johansen's trace test based on the VAR model exhibits great size distortion when the root of the AR component is large.

Table 6 .
Empirical size of unit root tests (no trend)-MC(a).