Abstract
In this paper the theory on the estimation of vector autoregressive (VAR) models for I(2) processes is extended to the case of long VAR approximation of more general processes. Hereby the order of the autoregression is allowed to tend to infinity at a certain rate depending on the sample size. We deal with unrestricted OLS estimators (in the model formulated in levels as well as in vector error correction form) as well as with two stage estimation (2SI2) in the vector error correction model (VECM) formulation. Our main results are analogous to the I(1) case: We show that the long VAR approximation leads to consistent estimates of the long and short run dynamics. Furthermore, tests on the autoregressive coefficients follow standard asymptotics. The pseudo likelihood ratio tests on the cointegrating ranks (using the Gaussian likelihood) used in the 2SI2 algorithm show under the null hypothesis the same distributions as in the case of data generating processes following finite order VARs. The same holds true for the asymptotic distribution of the long run dynamics both in the unrestricted VECM estimation and the reduced rank regression in the 2SI2 algorithm. Building on these results we show that if the data is generated by an invertible VARMA process, the VAR approximation can be used in order to derive a consistent initial estimator for subsequent pseudo likelihood optimization in the VARMA model.
1. Introduction
Many macroeconomic variables have been found to exhibit trend-like behaviour that can be modelled by using vector autoregressions (VARs). Katarina Juselius (2006) states that empirical modelling led to the development of I(1) and I(2) models since certain features of the datasets considered required including first and second differences in order to obtain stationary time series. Additionally cointegrating relations were found in the corresponding analyses. Similar findings have reoccurred numerous times in the literature for example related to money demand Johansen (1992b); Juselius (1994), inflation Banerjee et al. (2001); Georgoutsos and Kouretas (2004), interest rates and real exchange rates Johansen et al. (2007); Juselius and Assenmacher (2017); Juselius and Stillwagon (2018); Stillwagon (2018) to mention only a few sources.
The predominant methodological approach to model integration and cointegration in the I(1) and the I(2) case in the vector autoregressive (VAR) framework has been established mainly by Søren Johansen and Katarina Juselius together with a number of coauthors (see the lists of references in Johansen (1995); Juselius (2006) for details) building on vector error correction models (see Engle and Granger (1987) for early comments on the history of using error correction models for co-integrated processes). Extending the main ideas for cointegration modeling for the I(1) setting Johansen (1997) see, e.g., Johansen (1992a) suggested a representation for the I(2) case. Johansen (1997) established asymptotic distributions for the suggested two step I(2) estimator (2SI2) as an approximation to pseudo maximum likelihood estimation involving numerical optimization. Asymptotics for the corresponding likelihood ratio tests has been developed in Paruolo (1994, 1996), its asymptotic equivalence to pseudo likelihood (using the Gaussian distribution) optimization (and hence in a certain sense statistical efficiency) is shown in Paruolo (2000). However, Nielsen and Rahbek (2007) shows that in finite samples the likelihood ratio test has size advantages. The testing of restrictions on the parameters has been investigated by Boswijk and Doornik (2004); Boswijk and Paruolo (2017); Johansen and Lütkepohl (2005). Due to the implicit vector error correction (VECM) modeling, deterministic terms in the VECM produce complex deterministic terms in the solutions processes. In the I(2) context Nielsen and Rahbek (2007); Paruolo (1994, 2006); Rahbek et al. (1999); Kurita et al. (2011) discuss the impacts of deterministic terms.
As the VECM representation includes the representation of reduced rank matrices by a product of two matrices, identification conditions are of particular importance, see Juselius (2006); Mosconi and Paruolo (2013, 2017). In this context also weak exogeneity has been studied Kurita (2012); Paruolo and Rahbek (1999).
The main idea underlying the VECM approach for estimating VAR models in the I(2) context is to reparameterize the problem such that integration and cointegration properties relate to the rank of two matrices. Assuming the data generating process to be a VAR of known finite order, the rank of matrices can be tested using (pseudo) likelihood ratio tests.
Sometimes the assumption of known order is not justified. For example it is known that a subset of variables that are generated using a finite order VAR cannot be described by a finite order VAR, but instead requires a vector autoregressive moving average (VARMA) model. However, the class of VARs provides flexibility in the sense that a VAR of infinite order can represent a large set of linear dynamical systems including all invertible VARMA systems. For stationary processes Berk (1974) and Lewis and Reinsel (1985) show that by letting the order of the VAR tend to infinity at a suitable function of the sample size, consistent estimation of the underlying transfer function can be achieved for data generating processes that can be described by a VAR(∞) subject to mild assumptions on the summability of the VAR coefficients. Additionally Lewis and Reinsel (1985) also establishes asymptotic normality (in a very specific sense) of linear combinations of the estimated autoregressive coefficients. Hannan and Deistler (1988) make the concepts operational by showing that in the case of a VARMA process generating the dataset the required rate of letting the order tend to infinity can be estimated using BIC model selection.
In the case of I(1) processes the estimation theory for long VAR approximations to VARMA processes has been extended based on the techniques in the stationary case of Lewis and Reinsel in a series of papers by Saikkonen and coauthors Saikkonen (1991, 1992); Lütkepohl and Saikkonen (1997); Saikkonen and Lütkepohl (1996); Saikkonen and Luukkonen (1997). Additionally also the Johansen framework of rank restricted estimation in the VECM model has been extended to the long VAR approximations by Saikkonen and Luukkonen (1997). Bauer and Wagner (2004) provide extensions to the multi frequency I(1) case where unit roots may occur at the seasonal frequencies.
For the I(2) case no such extensions are currently known. This is the research gap this paper tries to fill: First we establish consistency and asymptotic normality of estimated autoregressive coefficients (in the sense of Lewis and Reinsel) for unrestricted ordinary least squares (OLS) estimation in the VECM representation. This can be used in order to derive Wald type tests of linear restrictions on the autoregressive parameters. Secondly, we extend the rank restricted regression techniques in the I(2) case to the long VAR approximations showing that the asymptotics (for estimated cointegrating relations, likelihood ratio tests and the two step estimation procedures) are identical in the case of long VAR approximations and VARs of finite known order. Third, we show that if the data generating process is an invertible VARMA process the long VAR system estimator can be used in order to obtain consistent initial estimators for subsequent pseudo likelihood maximization in the VARMA model class. In all results we limit ourselves to the case of no deterministic terms being included in the VECM representation. The inclusion of deterministic terms requires changing the test distribution, compare the theory contained for example in Rahbek et al. (1999).
The paper is organized as follows: In the next section the data generating process and the main assumptions are described. Section 3 then provides the results for the unrestricted estimation. Section 4 deals with rank restricted regression in the 2SI2 procedure, while Section 5 investigates the initial guess in the VARMA setting for subsequent pseudo likelihood maximization. Finally Section 6 concludes the paper. Proofs are relegated to an appendix.
Throughout the paper we will use the notation introduced by Johansen (1997): For a matrix of full column rank we use the notation . Furthermore, denotes a full column rank matrix of dimension such that . Whenever this notation is used the particular choice of is not of importance. For a matrix we let denote the Frobenius norm .
2. Data Generating Process and Assumptions
In this paper we use the following assumptions on the data generating process:
Assumption 1 (DGP).
The processis generated from the difference equation for:
whereare full column rank matrices,with L denoting the backward shift operator such that. The matrix functionfulfills the special marginal stability condition that
Furthermore, there exists a realsuch that the power series definingconverges absolutely for. Definewhereare of full column rank. Then it is assumed that the matrix
is nonsingular.
Furthermore, the processdenotes independent identically distributed (iid) white noise with mean zero and variance.
It is well known that the conditions (2) and (3) are necessary and sufficient for the existence of solutions to the difference equation that are I(2) processes, see for example Johansen (1992a). Moreover, note that the assumption of absolute convergence of for implies that for every . In particular follows as will be used frequently below.
Every vector autoregressive function corresponding to the autoregression , that fulfills Assumption 1, allows a representation as . This can be seen as follows:
where is without restriction of generality assumed to be an orthonormal matrix, and where we use that
Here
In this representation
is nonsingular due to assumption (3). Furthermore, is a transfer function with since and thus the same holds for the power series coefficients . Since it follows that . Therefore
is a VAR process. Note, however, that in general. This constitutes a triangular representation of the process denoting such that
where has a VAR(∞) representation. Furthermore, defining
we obtain such that
is another representation of the process with . It follows that the triangular representation can be seen as a special case where one has partial information on the matrices . For estimation the VECM representation is approximated using a finite order h:
where . As in the VECM representation the dimensions of are linked to the rank of the matrices and . Restricting these matrices to be of particular rank is simpler than imposing the equivalent restrictions in the VAR(h) representation directly.
In the following we will first investigate the unrestricted ordinary least squares estimator in the VECM representation without taking rank restrictions into account. In the second step the 2SI2 procedure as presented in Paruolo (2000) for imposing the two rank restrictions in two steps is investigated.
For both procedures the selection of the order h is of importance. In this respect the following assumption will be used:
Assumption 2 (Lag order h).
The order h is chosen subject to the following restrictions:
- .
- as .
This condition defines an upper bound for the order which is usually directly assured during order selection using for example information criteria. The upper bound is smaller than the usual rate for technical reasons. The stronger bound is not needed for all results. However, the implications for practical applications are minor as for example in the range we have . The second condition of Assumption 2 implies a lower bound for the increase of h as a function of the sample size. Clearly for . The bound implies that for this convergence needs to be fast enough such that still converges to zero. The lower bound depends on the underlying true parameters. For invertible VARMA processes – which can be seen as the leading case – for some . Hannan and Deistler (1988) show that for an invertible stationary VARMA process the lower bound (in this case proportional to ) can be achieved asymptotically by using BIC as the order selection procedure. Thus in this case also the stronger condition () is satisfied. Bauer and Wagner (2004) extend this result to the multi frequency I(1) setting. For the I(2) case no analogous result is known, although the developments of Bauer and Wagner (2004) suggest that a similar result holds also there. This is left for future research.
Therefore the difference between the ’usual’ rates and the ones assumed above are deemed to be of minor practical consequences. Thus we are not explicit in the main text as to which results hold true under the less restrictive set of results and which do not. In the appendix, we will comment on this point, however.
3. Unrestricted Estimation
In this section the results of Lewis and Reinsel (1985) and Saikkonen and Lütkepohl (1996) are extended to the I(2) case. To simplify notation define for sequences .1 Then the unrestricted least squares estimator in the finite VECM model uses the regressor vector . The corresponding ordinary least squares estimator is given as
The noise covariance is estimated from the residuals as usual as
where denotes the effective sample size.
3.1. Estimation in the Triangular VECM Representation
As typical for the cointegration framework, analysis is easier in the triangular representation which separates stationary components from I(1) and I(2) processes: Let where is such that
where has a VAR(∞) representation where
Note, however, that using the triangular representation implies that the matrix is known up the value of the matrix A. For applications this is the case only seldom.
Thus letting we obtain
with leads to the corresponding VECM representation:
Here , where is for : Similarly, , where is for . The sums exists since by assumption. Similarly, we partition , and into , and , respectively. The analogous partitioning is used for estimates.
Then . Therefore . Note that in this notation the I(2) components on the right hand side are , the I(1) components are , where is stationary. Thus in order to separate regressors of different integration orders in the proof (as is usually done in the literature) we use a transformation using the unknown matrix A such that the regressor is replaced by . Consequently the estimate of is replaced by the estimate of .
Based on the estimates and then A can be estimated as
Here the insertion of appears somewhat arbitrary. A motivation for this choice in the I(1) case can be found in Saikkonen (1992) equation (12). However, any other positive definite matrix could be used as well. Currently there is no knowledge on the optimality of the choice suggested above.
In the asymptotic distribution of the estimation error Brownian motions occur relating to the process : Under Assumption 1 we have
where denotes a Brownian motion with corresponding variance
where is a -dimensional Brownian motion, which is independent of , with covariance
An estimator of is given by2
With these definitions we can state our first result of the paper (which is proved in Appendix B):
Theorem 1.
Under Assumptions 1 and 2 for the triangular VECM representation we have:
(A) Consistency:
(B) Asymptotic distribution of coefficients to nonstationary regressors: Under Assumptions 1 and 2 we have ():
whereand.
(C) Asymptotic distribution of coefficients to stationary regressors: Letbe a sequence ofmatrices such thatwherewith.
Let
Then
(D) Asymptotic distribution on Wald type tests: Finally letting
where, the Wald test for the null hypothesisis given by
Then ifis such that, under the null hypothesis.
The theorem provides the asymptotic distributions of the OLS estimates in the triangular system. Note that in this somewhat special case the properties of the regressor components (stationary or not) are known such that for each entry the convergence speed is known. Correspondingly the definition of the regressor vector involves only lags of but omits all nonstationary regressors except the ones cointegrated with .
The assumptions on are more restrictive than needed. Lewis and Reinsel (1985) and Saikkonen and Lütkepohl (1996) only require that has full column rank when deriving the normalized convergence to normal distribution with unit variance as the limit for
Similar arguments could be used here.
3.2. Estimation in the General VECM Representation
The previous section dealt with the special case that a triangular representation is used and hence knowledge on the matrices is given. This section provides a result for the general case, which, however, is limited to the coefficients to the stationary components. Since a general process generated according to Assumption 1 can be rewritten into a triangular representation using the knowledge of , some asymptotic properties of the unrestricted OLS estimators can be derived from Theorem 1 for the general case (which is proved in Appendix C):
Theorem 2.
Let the regressor vectorand define
Then under Assumptions 1 and 2 it follows that.
Furthermore, letbe such that. Then
Beside consistency the theorem implies that linear combination of OLS estimators show asymptotic normality and hence standard inference, if the asymptotic variance is nonsingular. One application of such results consists in the so called ‘surplus lag’ formulation in the context of Granger causality testing, see Bauer and Maynard (2012); Dolado and Lütkepohl (1996).
Finally note that this section does not contain results with regard to the cointegrating rank or the cointegrating space. The theorem above merely allows to test coefficients corresponding to stationary regressors. Therefore the usage is limited to somewhat special situations like the surplus-lag causality tests. However, it is also relevant for impulse response analysis, compare Inoue and Kilian (2020).
4. Rank Restricted Regression
The previous sections show that for the estimators discussed in that sections full inference on all coefficients is only possible when information on the matrices and exists. The dimensions of the matrices relate to the ranks of the matrices and, conditional on this, to the rank of . The two rank restrictions make estimation and specification more complex than in the I(1) case.
Johansen (1995) provides the two-step approach 2SI2 that can be used for estimation and specification of the two integer valued parameters and . Paruolo and Rahbek (1999) extend the 2SI2 procedure suggested in section 8 of Johansen (1997). Paruolo (2000) shows that this 2SI2 procedure achieves the same asymptotic distribution as pseudo maximum likelihood estimation which could be performed subsequent to 2SI2 estimation. This makes the procedure attractive from a practical point of view. In this section we show that these approaches extend naturally to the long VAR case. The main focus here lies on the derivation of the asymptotic properties of the rank tests.
Recall the long VAR approximation given as
where has reduced rank and has reduced rank . In this notation the 2SI2 procedure works as follows: In the first step the rank constraint on is neglected estimating and by using reduced-rank regression (RRR). Then in the second step the reduced rank of is imposed using RRR in a transformed equation.
In more detail using the Johansen notation we denote with , and the residuals of regressing , and on , respectively; then we can rewrite (9) as
Concentrating out and denoting the residuals as and we obtain with the solution to the RRR problem from solving the eigenvalue problem
with solutions ordered with decreasing size and corresponding vectors . Then as usual the trace statistic of testing the model with , , in the model with , is given as
The optimizers for are given by
In the second step, given and known, we can obtain by multiplying (10) by that
Note that is stationary. Thus concentrating out C and denoting the residuals as and , respectively, we can define , for or . Then the likelihood ratio test of the model with , in the model with is given by
where are the solutions of the eigenvalue problem
and the corresponding eigenvectors are . Estimators of and are given by
For the 2SI2 procedure in this second step the first step estimates and are used in place of the unknown true quantities. Then we obtain the following analogon to the results in the finite order VAR framework (the proof is given in Appendix D):
Theorem 3.
Let the data be generated according to Assumption 1 and let the VAR order fulfill Assumption 2. Then the following asymptotic results hold:
(A) The asymptotic distribution of the likelihood ratio statisticunder the null hypothesisis given by
where and . This is identical to the distribution achieved in the finite VAR case.
(B) The asymptotic distribution of the likelihood ratio statistic under the null hypothesis is given by
where .
(C) The asymptotic distribution of the test statistic under the null hypothesis is given by
(D) Using suitable normalizations all estimators are consistent: where for example .
(E) The asymptotic distributions of the coefficients to the nonstationary regressors are identical to the ones in the finite order VAR case stated in Paruolo (2000). The asymptotic distribution of the coefficients are identical to the ones in Theorem 1.
The main message of the theorem is that the 2SI2 procedure shows the same asymptotic properties including the rank tests as in the finite order VAR case. As usual also restricting the coefficients for the non-stationary regressors does not influence the asymptotics for the coefficients corresponding to the stationary regressors.
Note that Paruolo (2000) shows that in the finite VAR case 2SI2 estimates have the same asymptotic distribution as pseudo maximum likelihood (pML) estimates maximizing the Gaussian likelihood. The first order conditions for the pML estimates of the coefficients to the non-stationary regressors provided in the first display on p. 548 in Paruolo (2000) depend on the data only via the matrices defined above. These matrices depend on the lag length of the VECM only via the concentration step. The proof of our Theorem 3 shows that these terms have the same asymptotic distributions for the finite order VAR and the long VAR. Theorem 4.3 of Paruolo (2000) shows that the asymptotic distribution of the coefficients due to stationary regressors does not depend on the distribution of the coefficients corresponding to the non-stationary regressors as long as they are estimated super-consistently. Thus our results imply that also in the long VAR case the asymptotic distribution of all estimates for the 2SI2 and the pML approach is identical.
5. Initial Guess for VARMA Estimation
One usage of long VAR approximations is as preliminary estimate for VARMA model estimation. Hannan and Kavalieris (1986) provide properties of such an approach in the stationary case, Lütkepohl and Claessen (1997) extend the procedure to the I(1) case. Here we extend this idea to the I(2) case.
The goal is to provide a consistent initial guess for the estimation of a VARMA model for I(2) processes. In this respect we assume the following data generating process:
Assumption 3 (VARMA dgp).
The process is generated as the solution to the state space equations
where denotes white noise subject to the same assumptions as in Assumption 1.
Here is the unobserved state process. The system is assumed to be minimal and in the canonical form of Bauer and Wagner (2012), that is
where (the matrix is stable), . Furthermore, the system is strictly minimum-phase, that is . Finally the matrix is nonsingular.
At time the state is such that is deterministic and denotes the stationary solution to the stable part of the system.
In this situation it follows that is an I(2) process in the definition of Bauer and Wagner (2012), that is its second difference is a stationary VARMA process. The integers c and d are connected to the integers via such that . It can furthermore be shown that a process generated using Assumption 3 possesses a VAR(h) approximation:
where () converges to zero exponentially fast for due to the strict minimum-phase condition. Letting then implies the existence of a VAR(∞) representation. It follows that for such systems converges absolutely for where .
From the autoregressive representation the VECM representation can be obtained:
where such that
A comparison of power series coefficients provides the identities:
It follows that the coefficients form the impulse response of a rational transfer function of order smaller or equal to n. If is nonsingular then the order equals n and the system is minimal. Furthermore, it follows that for arbitrary and the transfer function
is a rational transfer function with the additional property that
Consequently and determine the integration properties of processes generated using .
Conversely whenever the constraints
hold the corresponding triple corresponds to an I(2) process (if the eigenvalues of A are in the closed unit disc). Defining we obtain
The third equation does not have a solution for fixed , if the row space of does not contain the space spanned by the rows of . In this case row-wise projection of onto the space spanned by the rows of allows for (not necessarily unique) solutions in . In the limit no projection is needed. Consequently for large enough T the projected matrix will have full row rank. The second equation then determines which in turn determines up to the choice of the basis such that for some full row rank matrix . The first equation then can be rewritten as
The second equation shows that the row space of contains the row space of . Thus the matrix has full row rank. It follows that this equation has solutions.
Having obtained a solution for then C is obtained from
A unique solution then can be obtained from adding the restrictions which for the estimates are to be solved in a least squares sense among all solutions to equations (22).
It then follows that for the true matrices the only solution for given consists in the corresponding true C. These facts therefore can be used in order to develop an initial guess for subsequent pseudo likelihood maximization using the parameterization of I(2) processes in state space representation: Given the integer valued parameters and d:
- Obtain a long VAR approximation , including and using the 2SI2 approach.
- Choose the integer . Use the algorithm described in Appendix F to obtain estimates realizing the impulse response from the Hankel matrix with f block columns and f block rows.
- Project rows of onto the space spanned by the rows of to obtain .
- Obtain a unique solution solving (22) such that the matrices have minimal Euclidean distance to .
- Transform the corresponding system to the canonical form of Bauer and Wagner (2012) to obtain the estimate .
The algorithm obtains a minimal state space system of order n in the canonical form for I(2) processes given in Bauer and Wagner (2012) and hence can be used as an initial guess for subsequent pseudo-likelihood optimization in the set of all order n rational transfer functions corresponding to I(2) processes with state space unit root structure .
Theorem 4 (Consistent initial guess).
Let denote a process generated using the system according to Assumption 3 and let the system be estimated based on the long VAR approximation with lag order chosen according to Assumption 2. Then is a weakly consistent estimator of the data generating system in the sense that and hence the corresponding transfer functions converge in pointwise topology.
The proof of this theorem can be found in Appendix E.
6. Conclusions
In this paper the theory on long VAR approximation of general linear dynamical processes is extended to the case of I(2) processes. We find that we need slightly narrower upper and lower bounds in the approximations. The tighter bounds are not needed for all results and appear not very restrictive for applications.
The main results are completely analogous to the I(1) case: The asymptotics in many respects is identical to the finite order VAR case. Asymptotic distributions for the coefficients to non- stationary variables are the same as in the finite order VAR case. This holds true both for unrestricted OLS estimates as well as the 2SI2 approach in the Johansen framework. Tests on cointegrating ranks show identical asymptotic distributions under the null as in the finite order VAR case and hence do not require other tables. In this respect the main conclusion is that the usual procedure of estimating the lag order in the first step and then applying the Johansen procedure for estimated lag order is justified also for processes generated from a VAR(∞) that is approximated with a choice of the lag order lying within the prescribed bounds.
Additionally in the VARMA case the long VAR approximation can be used in order to derive consistent initial guesses that can be used in subsequent pseudo likelihood estimation.
Thus the paper provides both a full extension of results that have been achieved in the I(1) case as well as a useful starting point for subsequent VARMA modeling which might be preferable in situations which require a high VAR order or show a large number of variables to be modeled, a situation where VARMA models can be more parsimonious than VAR models.
Author Contributions
The two authors of the paper have contributed equally, via joint efforts, regarding both ideas, research, and writing. Conceptualization, Y.L. and D.B.; methodology, Y.L. and D.B.; software, not applicable; validation, not applicable; formal analysis, Y.L. and D.B.; investigation, Y.L. and D.B.; resources, not applicable; writing–original draft preparation, Y.L. and D.B.; writing–review and editing, Y.L. and D.B.; visualization, not applicable; supervision, not applicable; project administration, D.B.; funding acquisition, D.B. All authors have read and agreed to the published version of the manuscript.
Funding
This research was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation —Projektnummer 276051388) which is gratefully acknowledged. We acknowledge support for the publication costs by the Deutsche Forschungsgemeinschaft and the Open Access Publication Fund of Bielefeld University.
Acknowledgments
The reviewers and in particular the two guest editors provided significant comments that helped in improving the paper, which is gratefully acknowledged.
Conflicts of Interest
The authors declare no conflict of interest.
Appendix A. Preliminaries
The theory in this paper follows closely the arguments in Lewis and Reinsel (1985) and its extension to the I(1) case in Saikkonen and Lütkepohl (1996). To this end consider the finite order VECM approximation:
The properties of the various estimators heavily use the following rewriting of the approximation using the triangular representation of :
where and , , and
Furthermore, we can see that , , and . Finally .
Note that in the reparametrization (A2), the I(1) components, , as well as the I(2) components, , are isolated from the stationary ones, , and have coefficients equal to zero, which facilitates the derivation of the asymptotic properties.
In the reparameterized setting define 3 ,
,
we have
and correspondingly,
where
is the OLS estimator of . Here .
Note that and the regressors in (A1) are in one-one correspondence. In the original Equation (A1) beside the nonstationary regressors and the regressor vector
occurs which cointegrates with such that
is stationary. Here the nonsingular matrix is defined as:
Let , so that we have
It can be verified that is invertible. The asymptotic properties of are clarified in the next lemma:
Lemma A1.
Under the assumptions of Theorem 1 using as the effective sample size
where .
Proof.
The proof essentially shows that the coefficients corresponding to the stationary regressors and the ones corresponding to the integrated regressors asymptotically can be dealt with separately. Let . Note that , , and are the 1st, 2nd and 3rd column blocks of , respectively. Moreover, we have
Let and define where , and
Note that each block of the matrix is of order , and moreover, both and its limit are almost surely invertible, as there is no cointegration between and (see Lemma 3.1.1 in Chan and Wei (1988), and Sims et al. (1990)). Note that
Here has the limits stated in the lemma since:
The lemma therefore holds, if can be shown (where the blocks in correspond to the partitioning of into stationary, I(1) and I(2) components). For this it is sufficient to show:
- (I)
- (II)
- where and
- (III)
- .
Here denotes the spectral norm of a matrix while denotes the Frobenius norm.
(I) To see , according to Lewis and Reinsel (1985), it is sufficient to show Note that
then we have .
Now let , then there exists a transformation of full row rank, such that , where is a matrix:
Then, we have , where ; moreover, for , where . Since , and have the same rate of convergence as and , respectively. From Saikkonen (1991) Lemma A.2. we know and by direct calculation.
For note that
Then analogous calculation as for show that . Concluding we obtain such that .
To show note that where (see Lewis and Reinsel (1985), p. 397) and , since is a.s. invertible and converges in distribution to an almost surely nonsingular random matrix.
(II) With respect to note that
From Saikkonen (1991) Lemma A.5 we have , and . Then and imply
(III) To show note that according to (A.7) of Saikkonen (1992). Moreover implies . □
Note that for the lemma to hold we only need and .
Appendix B. Proof of Theorem 1
Appendix B.1. (A) Consistency
(i) Lemma A1 implies . Furthermore, the reparameterization implies and thus leading to
where the last inequality holds due to in combination with Lemma A1.
(ii) Note that
Now
where such that and . Consequently
Next, from the definition of , we can show that
where the last equality follows the law of large numbers and the first equality is implied by the fact that and .
(iii) From (i) and (ii), directly follows.
(iv) With respect to recall that
Then Lemma A1 shows that each entry of is of order . Then
which converges to zero for . Similarly .
For note that . Thus such that from (i) and Lemma A1.
(v) is contained in Lemma A1.
(vi) From (6), and the definition , we have
Then (i-iii, v) show the result.
Appendix B.2. (B) Asymptotic Distribution of Coefficients to Nonstationary Regressors
(i) The distribution of the coefficients due to the nonstationary components is contained in Lemma A1.
(ii) With respect to the cointegrating relation note that from the proof of Theorem 1 we have
Note that , where . Then by Lemma A1, we have
Note that , and by definition , we have
Therefore, we have
Appendix B.3. (C) Asymptotic Distribution of Coefficients to Stationary Regressors
Since the regressor vector is stationary, the asymptotic distribution of follows from Lewis and Reinsel (1985) in combination with uniform boundedness of the maximal and the minimal eigenvalue of , see above. Analogously the result for the coefficients corresponding to the regressor vector are shown as for nonsingular matrix .
Appendix B.4. (D) Asymptotic Distribution of Wald Type Tests
For the Wald test in addition to (C) note that the variance is replaced by an estimate . For
note that due to (A) (ii). The regressor vectors and differ only in the first block where replaces . Regressing out eliminates this difference. Then according to (Saikkonen and Lütkepohl 1996, p. 835, l. 3). There also invertibility of is shown. Using Lemma A.2 of Saikkonen and Lütkepohl (1996) this implies .
The rest then follows as the proof of Theorem 4 in Saikkonen and Lütkepohl (1996).
Appendix C. Proof of Theorem 2
Consistency follows directly from Theorem 1 as the general representation can be transformed into a triangular representation using the matrix , see (4).
With respect to the asymptotic distribution following the proof of Theorem 1 there exists a nonsingular transformation matrix such that . From it follows that
Therefore it follows that the blocks corresponding to the nonstationary regressors do not contribute to the asymptotic distribution. Then standard arguments for the stationary part of the regressor vector can be used.
Appendix D. Proofs for Theorem 3
The proof combines the ideas of Saikkonen and Luukkonen (1997) (in the following S&L) with the asymptotics of 2SI2 of Paruolo (2000) (in the following P). In the proof we will work without restriction of generality with the triangular representation.
The key to the asymptotic properties of the estimators obtained from the 2SI2 algorithm lies in the results of P Lemma A.4 and Lemma A.5 in the appendix. These lemmas deal with the limits of various moment matrices of the form corrected for the stationary components . The correction involves a regressor vector growing in dimension with sample size. This is dealt with in S&L.
In this respect let which according to (A4) is a linear function of such that . The definition of implies . On p. 543 in P the matrices are defined as limits of second moment matrices. Here refers to in the triangular representation, refers to and refers to . These are all stationary processes and linear functions of . Additional to also is corrected for in the second stage.
The arguments on p. 114 and 115 of S&L deal with terms of the form
Analogous arguments to S&L(A.12) show that this equals (up to terms of order )
S&L state that this is bounded from above and bounded away from zero. The second claim actually is wrong. If is univariate white noise with unit variance then is achieved by predicting by
including integration of the regressors in the form of the summation. This does not change the remaining arguments in S&L, it only implies that the separation of the eigenvalues corresponding to the stationary regressors and the ones corresponding to the non-stationary ones is weaker.
In the current case one can show that for
where contains and for the corresponding limit the lower bound holds for some . The order of the lower bound is achieved by including a double integration of the regressors. For
we have . Here the arguments from above can be applied to the process . For a differenced process the smallest eigenvalue of the matrix
is of order , compare Theorem 2 of Palma and Bondon (2003).
Since and it follows that
as well as
Therefore the limits of the moment matrices are not affected by the correction using stationary terms even if except for the terms involving the orders . For all stationary terms we find convergence to the corresponding limits denoted in P.
The first step in the 2SI2 procedure then uses RRR in the equation
Then denotes corrected for , denotes corrected for and denotes corrected for . Lemma A.4 of P derives the limits of different directions of defined as
where . Here equals correct for and . Further P uses the notation and . Here and below we assume without restriction of generality that is an orthonormal matrix. Consequently . Then the results above imply all results of Lemma A.4. of P except that now .
In particular we obtain the following limits:
Here denotes the Brownian motion corresponding to denotes the Brownian motion corresponding to (equaling corrected for ) corrected for ( whose only nonstationary component equals with corresponding Brownian motion ). Thus we obtain the following definitions (where L is as in Theorem 1):
The above arguments show that in the current setting and are contained in the space spanned by for . Therefore for . The subscript ’b’ refers to correcting for used in the second stage of 2SI2.
Let denote the limit of and analogously define and . For the latter two note that denotes the limit of
corrected for and . Since is stationary the last term is of order . Therefore it follows that . Then the results of Lemma A.5 of P hold where in (A.11) and (A.14) can be replaced by .
The asymptotic analysis below will heavily use the Johansen approach of investigating the solutions to eigenvalue problems in order to maximize the pseudo-likelihood corresponding to the reduced rank regression problem. In order to use the corresponding local analysis one has to first clarify consistency for the various estimators as well as rates of convergence.
The main tool in this respect is Theorem A.1 of Johansen (1997) which establishes in the I(2) setting for the regression ( being composed of stationary, I(1) and I(2) components) where and that where denotes the pseudo likelihood estimator over some closed parameter set .
It is straightforward to see that analogous results hold in the present setting when first concentrating out the stationary components: Consider . Then is obtained from the concentration step and the pseudo likelihood involves where again the processes and denote the processes and with the corresponding stationary regressors regressed out. These concentrated quantities now can be used in the proof of Theorem A.1 of Johansen (1997) essentially without changes to show consistency for . Consistency of then follows from the unrestricted estimation as contained in Theorem 2. As shown above the rates of convergence as well as the limits are unchanged for the coefficients corresponding to the non-stationary components of the regressors for the long VAR case compared to the finite VAR case.
Note that these results hold for general closed parameter space , thus including the unrestricted as well as the rank-reduced problem. This shows that we can always reduce the asymptotic analysis of the eigenvalue problems to a neighborhood of the true value as is done in P.
The first step in the proof of Theorem 4.1. of P consists in the investigation of the solutions to the equation (, letting )
Now Lemma A.4 implies that the matrix on the left hand side converges to . . Multiplying the equation by we obtain the limiting eigenvalue problem
equation
Therefore asymptotically the first eigenvalues of are positive, the remaining ones tending to zero. Likewise the eigenvectors converge at the same speed as the matrices. Thus from which
and thus using (A.11)
follows. Then as in P we have4
where and . Then the remaining arguments on p. 546 of P show that the asymptotic distribution of is identical for the long VAR case as in the finite VAR case.
From these arguments the distribution of the likelihood ratio test of versus can be shown: Define , and . Note that is of full rank, (11) is equivalent to ; that is,
Let , so that for every we have that as . By the above arguments we have that
which has no zero root. Moreover, we have
which yields that
where . Thus, the smallest solutions of (11) converge in distribution to the solutions of , which implies that the test statistic has the following limiting distribution,
For the second stage the arguments are very similar. The eigenvalue problem solved here is the following:
This formula uses , the ortho-complement of
From the above results noting that and according to Lemma A.4 we have . Considering the order of convergence we obtain . As in P this implies . Using from stage 1 one observes that in the eigenvalue problem estimates can be replaced by true quantities introducing an error of order :
Then as in P consider , reusing the symbols here for in place of as before. Identical arguments as around (A6) show that and . Then combining the arguments around (A6) with the developments in P, p. 546 and 547 we obtain (A.21) of P:
The rest of the proof of (4.3a) and (4.3b) of P follows as in P.
With respect to the second likelihood ratio test consider
The results above imply that has uniformly in (for every ) distance to of order where
Note that since is of full rank, (16) is equivalent to
Let , so that as . As above it can be seen that
This shows that the s larger roots of tend to zero slower than . Moreover, we have
which yields that (using )
using the results of Lemma A.5 of P. and (A.18) of Paruolo (1996) as an expression for
where .
Thus, the smallest solutions of (16) converge in distribution to the solutions of
which shows that the test statistic has the following limiting distribution,
It follows also that the sum converges in distribution showing (C).
The rest of the proof of relations (4.3a, b) of P follow exactly as in P. In P (4.4) the order of convergence is replaced by , in (4.5) the error term can be shown to be and in (4.6) instead of the term we achieve .
These terms show consistency for . Using the results of Lemma A.4 of P then consistency for follow.
Following the proof of Theorem 4.2. on pp. 548+549 of P we can show consistency for of P. The only changes refer to the orders of convergence where our setting introduces orders of h into the arguments. Jointly this proves consistency of and . Consistency for the coefficients to the stationary terms follows as usual from the consistency of the estimates for the coefficients to non-stationary regressors. This completes the proof of (D).
With respect to (E) note that the results above show that the asymptotics for the two eigenvalue problems to be solved converge to the same quantities as in the finite VAR case. This shows that the results of P in this respect hold also in the case of long VARs.
Finally for the matrices note that Theorem 4.3. of P shows that the asymptotic distribution for all quantities corresponding to stationary regressors are identical for every super-consistent estimator for the coefficients to the non-stationary components.
Appendix E. Proof of Theorem 4
From Theorem 3 it follows that . Therefore the Hankel matrix of impulse response coefficients converges to the Hankel matrix corresponding to the s. As is controllable, is minimal and is nonsingular according to the assumptions, this Hankel matrix has rank n. This implies that the stochastic realisation algorithm of Appendix F provides consistent estimates . This implies
For details see Appendix F.
does not necessarily correspond to a rational transfer function of order n. It does so, however, if the additional restrictions (22) hold. Step 3 and 4 of the proposed algorithm achieve this. Here step 3 ascertains that solutions to the third equation exist. The second equation explicitly provides a solution for given . This solution not necessarily is of full row rank. As in the limit this is the case, it also holds for large enough T. The first equation always admits solutions. Thus for large enough T the set of all solutions is defined by polynomial restrictions. Adding the least squares distance to the estimated impulse response sequence then leads to a quadratic problem under non-linear differentiable constraints, which in the limit has a unique solution. Thus the solution is unique for large enough T.
Consistency of the estimates in combination with continuity of the solution of step 4 implies consistency for the system . This implies consistency for the inverse system in the sense of converging impulse response coefficients and hence consistency for the transfer function estimator in the pointwise topology. The fulfillment of restrictions (22) ensures the structure of the corresponding matrix according to state space unit root structure .
Appendix F. Stochastic Realization Using Overlapping Echelon Forms
This section describes the approximate realization of the first f coefficients of an impulse response sequence using a rational transfer function of order n where . More details can be found in Section 2.6 of Hannan and Deistler (1988).
Define the Hankel matrix
Here denotes the j-th row in the i-th block row. Let define a nice selection of rows5 of such that , the submatrix of containing the rows , is of full row rank. If the impulse response corresponds to a transfer function of order at least n there exists such a nice selection . Finally let denote the matrix shifted down one block row (that is in each row where contains contains ).
Then it is derived in Hannan and Deistler (1988), Theorem 2.6.2. that if corresponds to a transfer function of order exactly n such that the corresponding is formed using a nice selection, then a system can be defined using the following formulas
such that .
If the order of the transfer function is larger than n, then the equations for A and C can be solved using least squares. If a sequence of impulse responses and the limit corresponds to a transfer function where the rank of equals n, it is obvious that the resulting systems since in this case the least squares solution depends continuously on the matrix .
References
- Banerjee, Anindya, Lynne Cockerell, and Bill Russell. 2001. An I(2) analysis of inflation and the markup. Journal of Applied Econometrics 16: 221–40. [Google Scholar] [CrossRef]
- Bauer, Dietmar, and Alex Maynard. 2012. Persistence-robust surplus-lag Granger causality testing. Journal of Econometrics 169: 293–300. [Google Scholar] [CrossRef]
- Bauer, Dietmar, and Martin Wagner. 2004. Autoregressive Approximations to MFI(1) Processes. Working Paper No. 174, Reihe Ökonomie/Economics Series, Vienna, Austria: Institut für Höhere Studien (IHS). [Google Scholar]
- Bauer, Dietmar, and Martin Wagner. 2012. A state space canonical form for unit root processes. Econometric Theory 28: 1313–49. [Google Scholar] [CrossRef]
- Berk, Kenneth N. 1974. Consistent autoregressive spectral estimates. The Annals of Statistics 2: 489–502. [Google Scholar] [CrossRef]
- Boswijk, H. Peter, and Jurgen A. Doornik. 2004. Identifying, estimating and testing restricted cointegrated systems: An overview. Statistica Neerlandica 58: 440–65. [Google Scholar] [CrossRef]
- Boswijk, H. Peter, and Paolo Paruolo. 2017. Likelihood ratio tests of restrictions on common trends loading matrices in I(2) VAR systems. Econometrics 5: 28. [Google Scholar] [CrossRef]
- Chan, Ngai Hang, and Ching Zong Wei. 1988. Limiting distributions of least squares estimates of unstable autoregressive processes. The Annals of Statistics 16: 367–401. [Google Scholar] [CrossRef]
- Dolado, Juan J., and Helmut Lütkepohl. 1996. Making Wald tests work for cointegrated VAR systems. Econometric Reviews 15: 369–86. [Google Scholar] [CrossRef]
- Engle, Robert F., and Clive W.J. Granger. 1987. Co-integration and error correction: Representation, estimation, and testing. Econometrica 55: 251–76. [Google Scholar] [CrossRef]
- Georgoutsos, Dimitris A., and Georgios P. Kouretas. 2004. A Multivariate I (2) cointegration analysis of German hyperinflation. Applied Financial Economics 14: 29–41. [Google Scholar] [CrossRef]
- Hannan, Edward James, and Manfred Deistler. 1988. The Statistical Theory of Linear Systems. New York: John Wiley. [Google Scholar]
- Hannan, Edward James, and Laimonis Kavalieris. 1986. Regression, autoregression models. Journal of Time Series Analysis 7: 27–49. [Google Scholar] [CrossRef]
- Inoue, Atsushi, and Lutz Kilian. 2020. The uniform validity of impulse response inference in autoregressions. Journal of Econometrics 215: 450–72. [Google Scholar] [CrossRef]
- Johansen, Søren, and Helmut Lütkepohl. 2005. A note on testing restrictions for the cointegration parameters of a VAR with I(2) variables. Econometric Theory 21: 653–58. [Google Scholar] [CrossRef]
- Johansen, Søren, Katarina Juselius, Roman Frydman, and Michael Goldberg. 2007. Testing hypotheses in an I (2) model with applications to the persistent long swings in the Dmk/$ rate. Journal of Econometrics 158: 1–35. [Google Scholar] [CrossRef]
- Johansen, Søren. 1992a. A representation of vector autoregresive processes integrated of order 2. Econometric Theory 8: 188–202. [Google Scholar] [CrossRef]
- Johansen, Søren. 1992b. Testing weak exogeneity and the order of cointegration in UK money demand data. Journal of Policy Modeling 14: 313–34. [Google Scholar] [CrossRef]
- Johansen, Søren. 1995. Likelihood-Based Inference in Cointegrated Vector Auto-Regressive Models. Oxford: Oxford University Press. [Google Scholar]
- Johansen, Søren. 1997. Likelihood analysis of the I(2) model. Scandinavian Journal of Statistics 24: 433–62. [Google Scholar] [CrossRef]
- Juselius, Katarina, and Katrin Assenmacher. 2017. Real exchange rate persistence and the excess return puzzle: The case of Switzerland versus the US. Journal of Applied Econometrics 32: 1145–55. [Google Scholar] [CrossRef]
- Juselius, Katarina, and Josh R. Stillwagon. 2018. Are outcomes driving expectations or the other way around? An I(2) CVAR analysis of interest rate expectations in the dollar/pound market. Journal of International Money and Finance 83: 93–105. [Google Scholar] [CrossRef]
- Juselius, Katarina. 1994. On the duality between long-run relations and common trends in the i(1) versus (2) model. an application to aggregate money holdings. Econometric Reviews 13: 151–78. [Google Scholar] [CrossRef]
- Juselius, Katarina. 2006. The Cointegrated VAR Model. Oxford: Oxford University Press. [Google Scholar]
- Kurita, Takamitsu, Heino Bohn Nielsen, and Anders Rahbek. 2011. An I(2) cointegration model with piecewise linear trends. Econometrics Journal 14: 131–55. [Google Scholar] [CrossRef]
- Kurita, Takamitsu. 2012. Likelihood-based inference for weak exogeneity in I(2) cointegrated VAR models. Econometric Reviews 31: 325–60. [Google Scholar] [CrossRef]
- Lewis, Richard, and Gregory C. Reinsel. 1985. Prediction of multivariate time series by autoregressive model fitting. Journal of Multivariate Analysis 16: 393–411. [Google Scholar] [CrossRef]
- Lütkepohl, Helmut, and Holger Claessen. 1997. Analysis of cointegrated VARMA processes. Journal of Econometrics 80: 223–29. [Google Scholar] [CrossRef]
- Lütkepohl, Helmut, and Pentti Saikkonen. 1997. Impulse response analysis in infinite order cointegrated vector autoregressive processes. Journal of Econometrics 81: 127–57. [Google Scholar] [CrossRef]
- Mosconi, Rocco, and Paolo Paruolo. 2013. Identification of Cointegrating Relations in I(2) Vector Autoregressive Models. Rome: PRIN Workshop Forecasting Economic and Financial Time Series, Research Publications at Politecnico di Milano, pp. 1–35. [Google Scholar]
- Mosconi, Rocco, and Paolo Paruolo. 2017. Identification conditions in simultaneous systems of cointegrating equations with integrated variables of higher order. Journal of Econometrics 198: 271–76. [Google Scholar] [CrossRef]
- Nielsen, Heino Bohn, and Anders Rahbek. 2007. The likelihood ratio test for cointegration ranks in the I(2) model. Econometric Theory 23: 615–37. [Google Scholar] [CrossRef]
- Palma, Wilfredo, and Pascal Bondon. 2003. On the eigenstructure of generalized fractional processes. Statistics and Probability Letters 65: 93–101. [Google Scholar] [CrossRef]
- Paruolo, Paolo, and Anders Rahbek. 1999. Weak exogeneity in I(2) VAR systems. Journal of Econometrics 93: 281–308. [Google Scholar] [CrossRef]
- Paruolo, Paolo. 1994. The role of the drift in I(2) systems. Journal of the Italian Statistical Society 3: 93–123. [Google Scholar] [CrossRef]
- Paruolo, Paolo. 1996. On the determination of integration indices in I(2) systems. Journal of Econometrics 72: 313–56. [Google Scholar] [CrossRef]
- Paruolo, Paolo. 2000. Asymptotic efficiency of the two stage estimator in I(2) systems. Econometric Theory 16: 524–50. [Google Scholar] [CrossRef]
- Paruolo, Paolo. 2006. Common trends and cycles in I(2) VAR systems. Journal of Econometrics 132: 143–68. [Google Scholar] [CrossRef]
- Rahbek, Anders, Hans Christian Kongsted, and Clara Jørgensen. 1999. Trend stationarity in the I(2) cointegration model. Journal of Econometrics 90: 265–289. [Google Scholar] [CrossRef]
- Saikkonen, Pentti, and Helmut Lütkepohl. 1996. Infinite-order cointegrated vector autoregressive processes: Estimation and inference. Econometric Theory 12: 814–44. [Google Scholar] [CrossRef]
- Saikkonen, Pentti, and Ritva Luukkonen. 1997. Testing cointegration in infinite order vector autoregressive processes. Journal of Econometrics 81: 93–126. [Google Scholar] [CrossRef]
- Saikkonen, Pentti. 1991. Asymptotically efficient estimation of cointegration regressions. Econometric Theory 7: 1–21. [Google Scholar] [CrossRef]
- Saikkonen, Pentti. 1992. Estimation and testing of cointegrated systems by an autoregressive approximation. Econometric Theory 8: 1–27. [Google Scholar] [CrossRef]
- Sims, Christopher A., James H. Stock, and Mark W. Watson. 1990. Inference in linear time series models with some unit roots. Econometrica 58: 113–44. [Google Scholar] [CrossRef]
- Stillwagon, Josh R. 2018. Are risk premia related to real exchange rate swings? Evidence from I(2) CVARs with survey expectations. Macroeconomic Dynamics 22: 255–78. [Google Scholar] [CrossRef]
| 1. | Here somewhat sloppily we use the same symbols for processes and their realizations. |
| 2. | Note that , and thus . |
| 3. | In this appendix processes whose dimension depends on the choice of h are denoted using upper case letters neglecting the dependence on h in the notation otherwise for simplicity. |
| 4. | Contrary to the usual Johansen notation we use as the noise covariance and as the variance of the Brownian motion corresponding to . Thus some of the formulas in this part show ’unusual’ form. |
| 5. | A nice selection is such that if is contained in the selection, then also are contained for all . |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).