Modeling I(2) Processes Using Vector Autoregressions Where the Lag Length Increases with the Sample Size

Yuanyuan Li; Dietmar Bauer

doi:10.3390/econometrics8030038

and

Faculty of Business Administration and Economics, Bielefeld University, Universitätsstrasse 25, D-33615 Bielefeld, Germany

^*

Author to whom correspondence should be addressed.

Econometrics2020, 8(3), 38;https://doi.org/10.3390/econometrics8030038

This article belongs to the Special Issue Celebrated Econometricians: Katarina Juselius and Søren Johansen

Version Notes

Order Reprints

Abstract

In this paper the theory on the estimation of vector autoregressive (VAR) models for I(2) processes is extended to the case of long VAR approximation of more general processes. Hereby the order of the autoregression is allowed to tend to infinity at a certain rate depending on the sample size. We deal with unrestricted OLS estimators (in the model formulated in levels as well as in vector error correction form) as well as with two stage estimation (2SI2) in the vector error correction model (VECM) formulation. Our main results are analogous to the I(1) case: We show that the long VAR approximation leads to consistent estimates of the long and short run dynamics. Furthermore, tests on the autoregressive coefficients follow standard asymptotics. The pseudo likelihood ratio tests on the cointegrating ranks (using the Gaussian likelihood) used in the 2SI2 algorithm show under the null hypothesis the same distributions as in the case of data generating processes following finite order VARs. The same holds true for the asymptotic distribution of the long run dynamics both in the unrestricted VECM estimation and the reduced rank regression in the 2SI2 algorithm. Building on these results we show that if the data is generated by an invertible VARMA process, the VAR approximation can be used in order to derive a consistent initial estimator for subsequent pseudo likelihood optimization in the VARMA model.

Keywords:

vector autoregressions; vector error correction model; integrated processes of order two

1. Introduction

Many macroeconomic variables have been found to exhibit trend-like behaviour that can be modelled by using vector autoregressions (VARs). Katarina Juselius (2006) states that empirical modelling led to the development of I(1) and I(2) models since certain features of the datasets considered required including first and second differences in order to obtain stationary time series. Additionally cointegrating relations were found in the corresponding analyses. Similar findings have reoccurred numerous times in the literature for example related to money demand Johansen (1992b); Juselius (1994), inflation Banerjee et al. (2001); Georgoutsos and Kouretas (2004), interest rates and real exchange rates Johansen et al. (2007); Juselius and Assenmacher (2017); Juselius and Stillwagon (2018); Stillwagon (2018) to mention only a few sources.

The predominant methodological approach to model integration and cointegration in the I(1) and the I(2) case in the vector autoregressive (VAR) framework has been established mainly by Søren Johansen and Katarina Juselius together with a number of coauthors (see the lists of references in Johansen (1995); Juselius (2006) for details) building on vector error correction models (see Engle and Granger (1987) for early comments on the history of using error correction models for co-integrated processes). Extending the main ideas for cointegration modeling for the I(1) setting Johansen (1997) see, e.g., Johansen (1992a) suggested a representation for the I(2) case. Johansen (1997) established asymptotic distributions for the suggested two step I(2) estimator (2SI2) as an approximation to pseudo maximum likelihood estimation involving numerical optimization. Asymptotics for the corresponding likelihood ratio tests has been developed in Paruolo (1994, 1996), its asymptotic equivalence to pseudo likelihood (using the Gaussian distribution) optimization (and hence in a certain sense statistical efficiency) is shown in Paruolo (2000). However, Nielsen and Rahbek (2007) shows that in finite samples the likelihood ratio test has size advantages. The testing of restrictions on the parameters has been investigated by Boswijk and Doornik (2004); Boswijk and Paruolo (2017); Johansen and Lütkepohl (2005). Due to the implicit vector error correction (VECM) modeling, deterministic terms in the VECM produce complex deterministic terms in the solutions processes. In the I(2) context Nielsen and Rahbek (2007); Paruolo (1994, 2006); Rahbek et al. (1999); Kurita et al. (2011) discuss the impacts of deterministic terms.

As the VECM representation includes the representation of reduced rank matrices by a product of two matrices, identification conditions are of particular importance, see Juselius (2006); Mosconi and Paruolo (2013, 2017). In this context also weak exogeneity has been studied Kurita (2012); Paruolo and Rahbek (1999).

The main idea underlying the VECM approach for estimating VAR models in the I(2) context is to reparameterize the problem such that integration and cointegration properties relate to the rank of two matrices. Assuming the data generating process to be a VAR of known finite order, the rank of matrices can be tested using (pseudo) likelihood ratio tests.

Sometimes the assumption of known order is not justified. For example it is known that a subset of variables that are generated using a finite order VAR cannot be described by a finite order VAR, but instead requires a vector autoregressive moving average (VARMA) model. However, the class of VARs provides flexibility in the sense that a VAR of infinite order can represent a large set of linear dynamical systems including all invertible VARMA systems. For stationary processes Berk (1974) and Lewis and Reinsel (1985) show that by letting the order of the VAR tend to infinity at a suitable function of the sample size, consistent estimation of the underlying transfer function can be achieved for data generating processes that can be described by a VAR(∞) subject to mild assumptions on the summability of the VAR coefficients. Additionally Lewis and Reinsel (1985) also establishes asymptotic normality (in a very specific sense) of linear combinations of the estimated autoregressive coefficients. Hannan and Deistler (1988) make the concepts operational by showing that in the case of a VARMA process generating the dataset the required rate of letting the order tend to infinity can be estimated using BIC model selection.

In the case of I(1) processes the estimation theory for long VAR approximations to VARMA processes has been extended based on the techniques in the stationary case of Lewis and Reinsel in a series of papers by Saikkonen and coauthors Saikkonen (1991, 1992); Lütkepohl and Saikkonen (1997); Saikkonen and Lütkepohl (1996); Saikkonen and Luukkonen (1997). Additionally also the Johansen framework of rank restricted estimation in the VECM model has been extended to the long VAR approximations by Saikkonen and Luukkonen (1997). Bauer and Wagner (2004) provide extensions to the multi frequency I(1) case where unit roots may occur at the seasonal frequencies.

For the I(2) case no such extensions are currently known. This is the research gap this paper tries to fill: First we establish consistency and asymptotic normality of estimated autoregressive coefficients (in the sense of Lewis and Reinsel) for unrestricted ordinary least squares (OLS) estimation in the VECM representation. This can be used in order to derive Wald type tests of linear restrictions on the autoregressive parameters. Secondly, we extend the rank restricted regression techniques in the I(2) case to the long VAR approximations showing that the asymptotics (for estimated cointegrating relations, likelihood ratio tests and the two step estimation procedures) are identical in the case of long VAR approximations and VARs of finite known order. Third, we show that if the data generating process is an invertible VARMA process the long VAR system estimator can be used in order to obtain consistent initial estimators for subsequent pseudo likelihood maximization in the VARMA model class. In all results we limit ourselves to the case of no deterministic terms being included in the VECM representation. The inclusion of deterministic terms requires changing the test distribution, compare the theory contained for example in Rahbek et al. (1999).

The paper is organized as follows: In the next section the data generating process and the main assumptions are described. Section 3 then provides the results for the unrestricted estimation. Section 4 deals with rank restricted regression in the 2SI2 procedure, while Section 5 investigates the initial guess in the VARMA setting for subsequent pseudo likelihood maximization. Finally Section 6 concludes the paper. Proofs are relegated to an appendix.

Throughout the paper we will use the notation introduced by Johansen (1997): For a matrix

C \in R^{p \times s}, s < p,

of full column rank we use the notation

\bar{C} = C {(C^{'} C)}^{- 1}

. Furthermore,

C_{⊥}

denotes a full column rank matrix of dimension

p \times (p - s)

such that

C_{⊥}^{'} C = 0

. Whenever this notation is used the particular choice of

C_{⊥}

is not of importance. For a matrix

C = (C_{i, j}) \in R^{p \times s}

we let

∥ C ∥

denote the Frobenius norm

∥ C ∥ = \sqrt{\sum_{i = 1}^{p} \sum_{j = 1}^{s} C_{i, j}^{2}}

.

2. Data Generating Process and Assumptions

In this paper we use the following assumptions on the data generating process:

Assumption 1 (DGP).

The process

{(y_{t})}_{t \in Z}, y_{t} \in R^{p},

is generated from the difference equation for

t \in Z

:

Δ^{2} y_{t} = α β^{'} y_{t - 1} + Γ Δ y_{t - 1} + \sum_{j = 1}^{\infty} Π_{j} Δ^{2} y_{t - j} + ε_{t}

(1)

where

α, β \in R^{p \times r}, 0 \leq r < p

are full column rank matrices,

Δ = (1 - L)

with L denoting the backward shift operator such that

L {(y_{t})}_{t \in Z} = {(y_{t - 1})}_{t \in Z}

. The matrix function

A (z) = {(1 - z)}^{2} I_{p} - α β^{'} z - Γ z (1 - z) - \sum_{j = 1}^{\infty} Π_{j} {(1 - z)}^{2} z^{j}

fulfills the special marginal stability condition that

| A (z) | = 0 i m p l i e s t h a t | z | > 1 o r z = 1 .

(2)

Furthermore, there exists a real

δ > 0

such that the power series defining

A (z)

converges absolutely for

| z | < 1 + δ

. Define

β_{2} = β_{⊥} η_{⊥}, α_{2} = α_{⊥} ζ_{⊥}

where

α_{⊥}^{'} Γ β_{⊥} = ζ η^{'}, η, ζ \in R^{(p - r) \times s}

are of full column rank

s < p - r

. Then it is assumed that the matrix

α_{2}^{'} (I_{p} + Γ \bar{β} {\bar{α}}^{'} Γ - \sum_{j = 1}^{\infty} Π_{j}) β_{2}

(3)

is nonsingular.

Furthermore, the process

{(ε_{t})}_{t \in Z}

denotes independent identically distributed (iid) white noise with mean zero and variance

Σ_{ϵ} > 0

.

It is well known that the conditions (2) and (3) are necessary and sufficient for the existence of solutions to the difference equation that are I(2) processes, see for example Johansen (1992a). Moreover, note that the assumption of absolute convergence of

A (z)

for

| z | < 1 + δ

implies that

\sum_{j = 0}^{\infty} j^{k} ∥ Π_{j} ∥ < \infty

for every

k \in N

. In particular

\sum_{j = 0}^{\infty} j^{2} ∥ Π_{j} ∥ < \infty

follows as will be used frequently below.

Every vector autoregressive function

A (z)

corresponding to the autoregression

A (L) y_{t} = ε_{t}

, that fulfills Assumption 1, allows a representation as

A (z) = {(1 - z)}^{2} I_{p} - α β^{'} z - Γ z (1 - z) - \sum_{j = 1}^{\infty} Π_{j} {(1 - z)}^{2} z^{j} = \tilde{g} (z) \tilde{B} (z), \tilde{B} (z) = {(1 - z)}^{2} I_{p} - \tilde{Π} z - \tilde{Γ} z (1 - z), \tilde{g} (z) = I_{p} + \sum_{j = 1}^{\infty} G_{j} z^{j}

. This can be seen as follows:

\begin{array}{l} ε_{t} & = A (L) y_{t} = (A (1) - \dot{A} (1) Δ + A^{*} (L) Δ^{2}) y_{t} = (A (1) - \dot{A} (1) Δ + A^{*} (L) Δ^{2}) B B^{'} y_{t} \\ = ([- α, 0, 0] + [α, 0, 0] Δ - Γ B Δ + A^{*} (L) B Δ^{2}) B^{'} y_{t} \\ = ([- α, - Γ β_{1}, - Γ β_{2}] + [α - Γ β, A_{1}^{*} (L), A_{2}^{*} (L)] Δ + [\begin{matrix} A_{0}^{*} (L), 0, 0 \end{matrix}] Δ^{2}) (\begin{matrix} β^{'} \\ β_{1}^{'} Δ \\ β_{2}^{'} Δ \end{matrix}) y_{t} \\ = ([- α, - Γ β_{1}, - Γ β_{2} + α {\bar{α}}^{'} Γ β_{2}] + [α - Γ β, A_{1}^{*} (L), {\tilde{A}}_{2}^{*} (L)] Δ + [\begin{matrix} A_{0}^{*} (L), 0, - A_{0}^{*} (L) {\bar{α}}^{'} Γ β_{2} \end{matrix}] Δ^{2}) (\begin{matrix} β^{'} + {\bar{α}}^{'} Γ β_{2} β_{2}^{'} Δ \\ β_{1}^{'} Δ \\ β_{2}^{'} Δ \end{matrix}) y_{t} \\ = ([- α, - Γ β_{1}, {\tilde{A}}_{2}^{*} (L)] + [α - Γ β, A_{1}^{*} (L), - A_{0}^{*} (L) {\bar{α}}^{'} Γ β_{2}] Δ + [\begin{matrix} A_{0}^{*} (L), 0, 0 \end{matrix}] Δ^{2}) (\begin{matrix} β^{'} + {\bar{α}}^{'} Γ β_{2} β_{2}^{'} Δ \\ β_{1}^{'} Δ \\ β_{2}^{'} Δ^{2} \end{matrix}) y_{t} \\ = g (L) B (L) y_{t} \end{array}

where

B = [β, β_{1}, β_{2}], β_{1} = β_{⊥} η,

is without restriction of generality assumed to be an orthonormal matrix,

A^{*} (L) B = [A_{0}^{*} (L), A_{1}^{*} (L), A_{2}^{*} (L)], A (1) = - α β^{'}, \dot{A} (1) = - α β^{'} + Γ

and where we use that

Γ β_{2} - α {\bar{α}}^{'} Γ β_{2} = (I_{p} - α {\bar{α}}^{'}) Γ β_{2} = {\bar{α}}_{⊥} α_{⊥}^{'} Γ β_{⊥} η_{⊥} = 0 .

Here

B (L) = (\begin{matrix} β^{'} + {\bar{α}}^{'} Γ β_{2} β_{2}^{'} Δ \\ β_{1}^{'} Δ \\ β_{2}^{'} Δ^{2} \end{matrix}) .

In this representation

g (1) = [\begin{matrix} - α, - Γ β_{1}, {\tilde{A}}_{2}^{*} (1) \end{matrix}]

is nonsingular due to assumption (3). Furthermore,

g (z) = \sum_{j = 0}^{\infty} G_{j} z^{j}

is a transfer function with

\sum_{j = 0}^{\infty} ∥ G_{j} ∥ j^{2} < \infty

since

\sum_{j = 1}^{\infty} ∥ Π_{j} ∥ j^{2} < \infty

and thus the same holds for the power series coefficients

A^{*} (L)

. Since

| B (z) | \neq 0, z \neq 1

it follows that

| g (z) | \neq 0, | z | \leq 1

. Therefore

B (L) y_{t} = u_{t}, g (L) u_{t} = ε_{t}

(4)

is a VAR process. Note, however, that

g (0) = G_{0} \neq I_{p}

in general. This constitutes a triangular representation of the process denoting

y_{1, t} = β^{'} y_{t} \in R^{p_{1}}, y_{2, t} = β_{1}^{'} y_{t} \in R^{p_{2}}, y_{3, t} = β_{2}^{'} y_{t} \in R^{p_{3}}

such that

\begin{array}{l} y_{1, t} & = - {\bar{α}}^{'} Γ β_{2} Δ y_{3, t} + u_{1, t} = A Δ y_{3, t} + u_{1, t} A : p_{1} \times p_{3} \\ Δ y_{2, t} & = u_{2, t}, \\ Δ^{2} y_{3, t} & = u_{3, t} \end{array}

where

u_{t} = {[u_{1, t}^{'}, u_{2, t}^{'}, u_{3, t}^{'}]}^{'}

has a VAR(∞) representation. Furthermore, defining

\begin{array}{l} \tilde{B} (L) & = B (\begin{matrix} I_{p_{1}} & 0 & - {\bar{α}}^{'} Γ β_{2} \\ 0 & I_{p_{2}} & 0 \\ 0 & 0 & I_{p_{3}} \end{matrix}) B (L) = Δ^{2} I_{p} + β β^{'} L + (β β^{'} + β {\bar{α}}^{'} Γ β_{2} β_{2}^{'} + β_{1} β_{1}^{'}) L Δ, \\ \tilde{g} (L) & = g (L) {(B (\begin{matrix} I_{p_{1}} & 0 & - {\bar{α}}^{'} Γ β_{2} \\ 0 & I_{p_{2}} & 0 \\ 0 & 0 & I_{p_{3}} \end{matrix}))}^{- 1} \end{array}

we obtain

A (L) = g (L) B (L) = \tilde{g} (L) \tilde{B} (L)

such that

\tilde{B} (L) y_{t} = Δ^{2} y_{t} + \tilde{Π} y_{t - 1} + \tilde{Γ} Δ y_{t - 1} = v_{t}, \tilde{g} (L) v_{t} = ε_{t}

is another representation of the process

{(y_{t})}_{t \in Z}

with

\tilde{B} (0) = I_{p}

. It follows that the triangular representation can be seen as a special case where one has partial information on the matrices

β, β_{1}, β_{2}

. For estimation the VECM representation is approximated using a finite order h:

Δ^{2} y_{t} = Φ y_{t - 1} + Ψ Δ y_{t - 1} + \sum_{j = 1}^{h - 2} Π_{j} Δ^{2} y_{t - j} + e_{t}

where

e_{t} = ε_{t} + e_{1 t}, e_{1 t} = \sum_{j = h - 1}^{\infty} Π_{j} Δ^{2} y_{t - j}

. As in the VECM representation the dimensions of

β, β_{1}, β_{2}

are linked to the rank of the matrices

Φ

and

α_{⊥}^{'} Ψ β_{⊥}

. Restricting these matrices to be of particular rank is simpler than imposing the equivalent restrictions in the VAR(h) representation directly.

In the following we will first investigate the unrestricted ordinary least squares estimator in the VECM representation without taking rank restrictions into account. In the second step the 2SI2 procedure as presented in Paruolo (2000) for imposing the two rank restrictions in two steps is investigated.

For both procedures the selection of the order h is of importance. In this respect the following assumption will be used:

Assumption 2 (Lag order h).

The order h is chosen subject to the following restrictions:

$h = o (T^{1 / 5})$ .
$T^{1 / 2} \sum_{j = h + 1}^{\infty} ∥ Π_{j} ∥ \to 0$ as $T, h \to \infty$ .

This condition defines an upper bound for the order which is usually directly assured during order selection using for example information criteria. The upper bound is smaller than the usual rate

T^{1 / 3}

for technical reasons. The stronger bound is not needed for all results. However, the implications for practical applications are minor as for example in the range

1 \leq T \leq 950

we have

2.5 T^{1 / 5} > T^{1 / 3}

. The second condition of Assumption 2 implies a lower bound for the increase of h as a function of the sample size. Clearly

\sum_{j = h + 1}^{\infty} ∥ Π_{j} ∥ \to 0

for

h \to \infty

. The bound implies that for

h = h (T)

this convergence needs to be fast enough such that

T^{1 / 2} \sum_{j = h (T) + 1}^{\infty} ∥ Π_{j} ∥

still converges to zero. The lower bound depends on the underlying true parameters. For invertible VARMA processes – which can be seen as the leading case –

∥ Π_{j} ∥ \leq C ρ_{0}^{j}

for some

0 \leq ρ_{0} < 1

. Hannan and Deistler (1988) show that for an invertible stationary VARMA process the lower bound (in this case proportional to

log T

) can be achieved asymptotically by using BIC as the order selection procedure. Thus in this case also the stronger condition (

h = o (T^{1 / 5})

) is satisfied. Bauer and Wagner (2004) extend this result to the multi frequency I(1) setting. For the I(2) case no analogous result is known, although the developments of Bauer and Wagner (2004) suggest that a similar result holds also there. This is left for future research.

Therefore the difference between the ’usual’ rates and the ones assumed above are deemed to be of minor practical consequences. Thus we are not explicit in the main text as to which results hold true under the less restrictive set of results and which do not. In the appendix, we will comment on this point, however.

3. Unrestricted Estimation

In this section the results of Lewis and Reinsel (1985) and Saikkonen and Lütkepohl (1996) are extended to the I(2) case. To simplify notation define

⟨ a_{t}, b_{t} ⟩ = \sum_{t = h + 1}^{T} a_{t} b_{t}^{'}

for sequences

a_{t}, b_{t}, t = 1, \dots, T

.1 Then the unrestricted least squares estimator in the finite VECM model uses the regressor vector

Z_{t, h} = {[y_{t - 1}^{'}, Δ y_{t - 1}^{'}, Δ^{2} y_{t - 1}^{'}, \dots, Δ^{2} y_{t - h + 2}^{'}]}^{'} \in R^{p h}

. The corresponding ordinary least squares estimator is given as

\begin{array}{l} [\hat{Φ}, \hat{Ψ}, {\hat{Π}}_{1}, \dots, {\hat{Π}}_{h - 2}] & = [⟨ Δ^{2} y_{t}, y_{t - 1} ⟩, ⟨ Δ^{2} y_{t}, Δ y_{t - 1} ⟩, ⟨ Δ^{2} y_{t}, Δ^{2} y_{t - 1} ⟩, \dots, ⟨ Δ^{2} y_{t}, Δ^{2} y_{t - h + 2} ⟩] {⟨ Z_{t, h}, Z_{t, h} ⟩}^{- 1} \\ = ⟨ Δ^{2} y_{t}, Z_{t, h} ⟩ {⟨ Z_{t, h}, Z_{t, h} ⟩}^{- 1} . \end{array}

The noise covariance is estimated from the residuals as usual as

{\hat{Σ}}_{ϵ} = N^{- 1} ⟨ {\hat{e}}_{t}, {\hat{e}}_{t} ⟩, {\hat{e}}_{t} = Δ^{2} y_{t} - \hat{Φ} y_{t - 1} - \hat{Ψ} Δ y_{t - 1} - \sum_{j = 1}^{h - 2} {\hat{Π}}_{j} Δ^{2} y_{t - j}

(5)

where

N = T - h

denotes the effective sample size.

3.1. Estimation in the Triangular VECM Representation

As typical for the cointegration framework, analysis is easier in the triangular representation which separates stationary components from I(1) and I(2) processes: Let

y_{t} = {[y_{1, t}^{'}, y_{2, t}^{'}, y_{3, t}^{'}]}^{'} \in R^{p}

where

y_{i, t} \in R^{p_{i}}

is such that

\begin{array}{l} y_{1, t} & = A Δ y_{3, t} + u_{1, t}, \\ Δ y_{2, t} & = u_{2, t}, \\ Δ^{2} y_{3, t} & = u_{3, t} \end{array}

where

u_{t} = {[u_{1, t}^{'}, u_{2, t}^{'}, u_{3, t}^{'}]}^{'}

has a VAR(∞) representation

g (L) u_{t} = ε_{t}

where

g (0) = (\begin{matrix} I & 0 & A \\ 0 & I & 0 \\ 0 & 0 & I \end{matrix}) .

Note, however, that using the triangular representation implies that the matrix

B (L)

is known up the value of the matrix A. For applications this is the case only seldom.

Thus letting

g (z) = g (1) + g^{*} (z) Δ

we obtain

\begin{array}{l} ε_{t} & = g (L) (\begin{matrix} y_{1, t} - A Δ y_{3, t} \\ Δ y_{2, t} \\ Δ^{2} y_{3, t} \end{matrix}) = g (L) (\begin{matrix} Δ^{2} y_{1, t} + Δ y_{1, t - 1} + y_{1, t - 1} - A Δ^{2} y_{3, t} - A Δ y_{3, t - 1} \\ Δ^{2} y_{2, t} + Δ y_{2, t - 1} \\ Δ^{2} y_{3, t} \end{matrix}) \\ = g (L) (\begin{matrix} I & 0 & - A \\ 0 & I & 0 \\ 0 & 0 & I \end{matrix}) Δ^{2} y_{t} + g (L) (\begin{matrix} y_{1, t - 1} \\ 0 \\ 0 \end{matrix}) + g (L) (\begin{matrix} Δ y_{1, t - 1} - A Δ y_{3, t - 1} \\ Δ y_{2, t - 1} \\ 0 \end{matrix}) \\ = \tilde{g} (L) Δ^{2} y_{t} + [g (1) + g^{*} (L) Δ] (\begin{matrix} y_{1, t - 1} \\ 0 \\ 0 \end{matrix}) + g (1) (\begin{matrix} Δ y_{1, t - 1} - A Δ y_{3, t - 1} \\ Δ y_{2, t - 1} \\ 0 \end{matrix}) \\ = π (L) Δ^{2} y_{t} + g (1) (\begin{matrix} y_{1, t - 1} \\ 0 \\ 0 \end{matrix}) + [\begin{matrix} G_{1} + G_{1}^{*} & G_{2} & - G_{1} A \end{matrix}] (\begin{matrix} Δ y_{1, t - 1} \\ Δ y_{2, t - 1} \\ Δ y_{3, t - 1} \end{matrix}) \\ = π (L) Δ^{2} y_{t} + [\begin{matrix} G_{1} & 0 & 0 \end{matrix}] y_{t - 1} + [\begin{matrix} G_{1} + G_{1}^{*} & G_{2} & - G_{1} A \end{matrix}] Δ y_{t - 1} \end{array}

with

π (L) = I_{p} - \sum_{j = 1}^{\infty} Π_{j} L^{j}

leads to the corresponding VECM representation:

Δ^{2} y_{t} = Φ y_{t - 1} + Ψ Δ y_{t - 1} + \sum_{j = 1}^{\infty} Π_{j} Δ^{2} y_{t - j} + ε_{t} .

Here

G : = g (1) = \sum_{j = 0}^{\infty} G_{j} = [G_{1}, G_{2}, G_{3}]

, where

G_{i}

is

p \times p_{i}

for

i = 1, 2, 3

: Similarly,

G^{*} : = g^{*} (1) = - \sum_{j = 0}^{\infty} j G_{j} = [G_{1}^{*}, G_{2}^{*}, G_{3}^{*}]

, where

G_{i}^{*}

is

p \times p_{i}

for

i = 1, 2, 3

. The sums exists since

\sum_{j = 1}^{\infty} ∥ G_{j} ∥ j^{2} < \infty

by assumption. Similarly, we partition

Φ

,

Ψ

and

Π_{j}

into

[Φ_{1}, Φ_{2}, Φ_{3}]

,

[Ψ_{1}, Ψ_{2}, Ψ_{3}]

and

[Π_{j 1}, Π_{j 2}, Π_{j 3}]

, respectively. The analogous partitioning is used for estimates.

Then

Φ = - [G_{1}, 0, 0], Ψ = [- G_{1}^{*} - G_{1}, - G_{2}, G_{1} A]

. Therefore

Ψ_{3} = - Φ_{1} A

. Note that in this notation the I(2) components on the right hand side are

y_{t - 1, 3}

, the I(1) components are

y_{t - 1, 1}, y_{t - 1, 2}, Δ y_{t - 1, 3}

, where

y_{t - 1, 1} - A Δ y_{t - 1, 3}

is stationary. Thus in order to separate regressors of different integration orders in the proof (as is usually done in the literature) we use a transformation using the unknown matrix A such that the regressor

y_{t - 1, 1}

is replaced by

y_{t - 1, 1} - A Δ y_{t - 1, 3}

. Consequently the estimate

{\hat{Ψ}}_{3}

of

Ψ_{3}

is replaced by the estimate

\hat{Θ} = {\hat{Ψ}}_{3} + {\hat{Φ}}_{1} A

of

Θ = Ψ_{3} + Φ_{1} A = 0

.

Based on the estimates

\hat{Ψ}

and

\hat{Φ}

then A can be estimated as

\hat{A} = - {({\hat{Φ}}_{1}^{'} {\hat{Σ}}_{ϵ}^{- 1} {\hat{Φ}}_{1})}^{- 1} {\hat{Φ}}_{1}^{'} {\hat{Σ}}_{ϵ}^{- 1} {\hat{Ψ}}_{3} .

(6)

Here the insertion of

{\hat{Σ}}_{ϵ}^{- 1}

appears somewhat arbitrary. A motivation for this choice in the I(1) case can be found in Saikkonen (1992) equation (12). However, any other positive definite matrix could be used as well. Currently there is no knowledge on the optimality of the choice suggested above.

In the asymptotic distribution of the estimation error Brownian motions occur relating to the process

{(u_{t})}_{t \in Z}

: Under Assumption 1 we have

\frac{1}{\sqrt{T}} \sum_{t = 1}^{⌊ r T ⌋} u_{t} \Rightarrow B (r) = {[B_{1} {(r)}^{'}, B_{c} {(r)}^{'}]}^{'} = {[B_{1} {(r)}^{'}, B_{2} {(r)}^{'}, B_{3} (r)]}^{'}

where

B (r), 0 \leq r \leq 1,

denotes a Brownian motion with corresponding variance

Ω = [\begin{array}{c} Ω_{11} & Ω_{1 c} \\ Ω_{c 1} & Ω_{c c} \end{array}] = [\begin{array}{c} Ω_{11} & Ω_{12} & Ω_{13} \\ Ω_{21} & Ω_{22} & Ω_{23} \\ Ω_{31} & Ω_{32} & Ω_{33} \end{array}] = g {(1)}^{- 1} Σ_{ϵ} {(g {(1)}^{'})}^{- 1},

where

B_{1 . c} (r) = B_{1} (r) - Ω_{1 c} Ω_{c c}^{- 1} B_{c} (r)

is a

p_{1}

-dimensional Brownian motion, which is independent of

B_{c} (r)

, with covariance

Ω_{1 . c} = Ω_{11} - Ω_{1 c} Ω_{c c}^{- 1} Ω_{c 1} .

An estimator of

Ω_{1 . c}

is given by2

{\hat{Ω}}_{1 . c} = {({\hat{Φ}}_{1}^{'} {\hat{Σ}}_{ϵ}^{- 1} {\hat{Φ}}_{1})}^{- 1} .

(7)

With these definitions we can state our first result of the paper (which is proved in Appendix B):

Theorem 1.

Under Assumptions 1 and 2 for the triangular VECM representation we have:

(A) Consistency:

(i) \hat{Φ} \overset{p}{\to} Φ; (i i) {\hat{Σ}}_{ϵ} \overset{p}{\to} Σ_{ϵ}; (i i i) {\hat{Ω}}_{1 . c} \overset{p}{\to} Ω_{1 . c}; (i v) \hat{Ψ} \overset{p}{\to} Ψ; (v) \hat{Θ} \overset{p}{\to} 0; (v i) \hat{A} \overset{p}{\to} A .

(B) Asymptotic distribution of coefficients to nonstationary regressors: Under Assumptions 1 and 2 we have (

N = T - h

):

\begin{array}{l} (i) [N {\hat{Φ}}_{2}, N \hat{Θ}, N^{2} {\hat{Φ}}_{3}] \overset{d}{\to} g (1) \int_{0}^{1} d B F^{'} {(\int_{0}^{1} F F^{'})}^{- 1}, & (i i) N (\hat{A} - A) \overset{d}{\to} \int_{0}^{1} d B_{1 . c} L^{'} {(\int_{0}^{1} L L^{'})}^{- 1} \end{array}

(8)

where

F (u) = [\begin{matrix} B_{c} (u) \\ \int_{0}^{u} B_{3} (v) d v \end{matrix}], F_{a} (u) = [\begin{matrix} B_{2} (u) \\ \int_{0}^{u} B_{3} (v) d v \end{matrix}]

and

L (u) = B_{3} (u) - \int_{0}^{1} B_{3} F_{a}^{'} {(\int_{0}^{1} F_{a} F_{a}^{'})}^{- 1} F_{a} (u)

.

(C) Asymptotic distribution of coefficients to stationary regressors: Let

L_{h}

be a sequence of

(p^{2} (h - 2) + p (2 p_{1} + p_{2})) \times J

matrices such that

L_{h}^{'} (Γ_{E C M}^{- 1} \otimes Σ_{ϵ}) L_{h} \to M > 0

where

Γ_{E C M} = E (X_{t} X_{t}^{'})

with

X_{t} : = {[u_{1, t - 1}^{'}, Δ y_{1, t - 1}^{'}, Δ y_{2, t - 1}^{'}, Δ^{2} y_{t - 1}^{'}, \dots, Δ^{2} y_{t - h + 2}^{'}]}^{'}

.

Let

\underset{̲}{Π} = [\begin{matrix} Φ_{1} & Ψ_{1} & Ψ_{2} & Π_{1} & \dots & Π_{h - 2} \end{matrix}] .

Then

N^{\frac{1}{2}} L_{h}^{'} v e c (\hat{Π} - \underset{̲}{Π}) \overset{d}{\to} N (0, M) .

(D) Asymptotic distribution on Wald type tests: Finally letting

{\hat{Γ}}_{E C M} = N^{- 1} (⟨ {\tilde{X}}_{t}, {\tilde{X}}_{t} ⟩ - ⟨ {\tilde{X}}_{t}, Δ y_{3, t - 1} ⟩ {⟨ Δ y_{3, t - 1}, Δ y_{3, t - 1} ⟩}^{- 1} ⟨ Δ y_{3, t - 1}, {\tilde{X}}_{t} ⟩)

where

{\tilde{X}}_{t} = {[y_{1, t - 1}^{'}, Δ y_{1, t - 1}^{'}, Δ y_{2, t - 1}^{'}, Δ^{2} y_{t - 1}^{'}, \dots, Δ^{2} y_{t - h + 2}^{'}]}^{'}

, the Wald test for the null hypothesis

H_{0} : L_{h}^{'} v e c (\underset{̲}{Π}) = l_{h}

is given by

{\hat{λ}}_{W a l d} = N {(L_{h}^{'} v e c (\hat{Π}) - l_{h})}^{'} {(L_{h}^{'} ({\hat{Γ}}_{E C M}^{- 1} \otimes {\hat{Σ}}_{ϵ}) L_{h})}^{- 1} (L_{h}^{'} v e c (\hat{Π}) - l_{h}) .

Then if

L_{h}

is such that

L_{h}^{'} (Γ_{E C M}^{- 1} \otimes Σ_{ϵ}) L_{h} \to M > 0

, under the null hypothesis

{\hat{λ}}_{W a l d} \overset{d}{\to} χ^{2} (J)

.

The theorem provides the asymptotic distributions of the OLS estimates in the triangular system. Note that in this somewhat special case the properties of the regressor components (stationary or not) are known such that for each entry the convergence speed is known. Correspondingly the definition of the regressor vector

{\tilde{X}}_{t}

involves only lags of

y_{t}

but omits all nonstationary regressors except the ones cointegrated with

Δ y_{3, t - 1}

.

The assumptions on

L_{h}

are more restrictive than needed. Lewis and Reinsel (1985) and Saikkonen and Lütkepohl (1996) only require that

L_{h}

has full column rank when deriving the normalized convergence to normal distribution with unit variance as the limit for

N^{\frac{1}{2}} {(L_{h}^{'} (Γ_{E C M}^{- 1} \otimes Σ_{ϵ}) L_{h})}^{- 1 / 2} L_{h}^{'} v e c (\hat{Π} - \underset{̲}{Π}) .

Similar arguments could be used here.

3.2. Estimation in the General VECM Representation

The previous section dealt with the special case that a triangular representation is used and hence knowledge on the matrices

[β, β_{1}, β_{2}]

is given. This section provides a result for the general case, which, however, is limited to the coefficients to the stationary components. Since a general process generated according to Assumption 1 can be rewritten into a triangular representation using the knowledge of

[β, β_{1}, β_{2}]

, some asymptotic properties of the unrestricted OLS estimators can be derived from Theorem 1 for the general case (which is proved in Appendix C):

Theorem 2.

Let the regressor vector

Z_{t, h}

= [y_{t - 1}^{'}, Δ y_{t - 1}^{'}, Δ^{2} y_{t - 1}^{'}, \dots, Δ^{2} y_{t - h + 2}^{'}]

and define

\begin{array}{l} \underset{̲}{Λ} & = [\begin{matrix} Φ & Ψ & Π_{1} & \dots & Π_{h - 2} \end{matrix}], \tilde{Λ} = ⟨ Δ^{2} y_{t}, Z_{t, h} ⟩ {⟨ Z_{t, h}, Z_{t, h} ⟩}^{- 1}, {\tilde{Γ}}_{E C M} = N^{- 1} ⟨ Z_{t, h}, Z_{t, h} ⟩ . \end{array}

Then under Assumptions 1 and 2 it follows that

\tilde{Λ} - \underset{̲}{Λ} = o_{P} (1)

.

Furthermore, let

L_{h} \in R^{p^{2} (h + 2) \times J}

be such that

L_{h}^{'} ({\tilde{Γ}}_{E C M}^{- 1} \otimes Σ_{ϵ}) L_{h} \to M > 0

. Then

N^{\frac{1}{2}} L_{h}^{'} v e c (\tilde{Λ} - \underset{̲}{Λ}) \overset{d}{\to} N (0, M) .

Beside consistency the theorem implies that linear combination of OLS estimators show asymptotic normality and hence standard inference, if the asymptotic variance is nonsingular. One application of such results consists in the so called ‘surplus lag’ formulation in the context of Granger causality testing, see Bauer and Maynard (2012); Dolado and Lütkepohl (1996).

Finally note that this section does not contain results with regard to the cointegrating rank or the cointegrating space. The theorem above merely allows to test coefficients corresponding to stationary regressors. Therefore the usage is limited to somewhat special situations like the surplus-lag causality tests. However, it is also relevant for impulse response analysis, compare Inoue and Kilian (2020).

4. Rank Restricted Regression

The previous sections show that for the estimators discussed in that sections full inference on all coefficients is only possible when information on the matrices

β, β_{1}

and

β_{2}

exists. The dimensions of the matrices relate to the ranks of the matrices

Φ = α β^{'}

and, conditional on this, to the rank of

{\bar{α}}_{⊥}^{'} Ψ {\bar{β}}_{⊥}

. The two rank restrictions make estimation and specification more complex than in the I(1) case.

Johansen (1995) provides the two-step approach 2SI2 that can be used for estimation and specification of the two integer valued parameters

p_{1}

and

p_{2}

. Paruolo and Rahbek (1999) extend the 2SI2 procedure suggested in section 8 of Johansen (1997). Paruolo (2000) shows that this 2SI2 procedure achieves the same asymptotic distribution as pseudo maximum likelihood estimation which could be performed subsequent to 2SI2 estimation. This makes the procedure attractive from a practical point of view. In this section we show that these approaches extend naturally to the long VAR case. The main focus here lies on the derivation of the asymptotic properties of the rank tests.

Recall the long VAR approximation given as

Δ^{2} y_{t} = Φ y_{t - 1} + Ψ Δ y_{t - 1} + \sum_{j = 1}^{h - 2} Π_{j} Δ^{2} y_{t - j} + e_{t}

(9)

where

Φ = α β^{'}

has reduced rank

r < p

and

{\bar{α}}_{⊥}^{'} Ψ {\bar{β}}_{⊥} = ζ η^{'}

has reduced rank

s < p - r

. In this notation the 2SI2 procedure works as follows: In the first step the rank constraint on

{\bar{α}}_{⊥}^{'} Ψ {\bar{β}}_{⊥}

is neglected estimating

α

and

β

by using reduced-rank regression (RRR). Then in the second step the reduced rank of

{\bar{α}}_{⊥}^{'} Ψ {\bar{β}}_{⊥}

is imposed using RRR in a transformed equation.

In more detail using the Johansen notation we denote with

R_{0 t}

,

R_{1 t}

and

R_{2 t}

the residuals of regressing

Δ^{2} y_{t}

,

Δ y_{t - 1}

and

y_{t - 1}

on

Δ^{2} y_{t - 1}, \dots, Δ^{2} y_{t - h + 2}

, respectively; then we can rewrite (9) as

R_{0 t} = α β^{'} R_{2 t} + Ψ R_{1 t} + {\tilde{e}}_{t} .

(10)

Concentrating out

R_{1 t}

and denoting the residuals as

R_{0.1 t}

and

R_{2.1 t}

we obtain with

S_{i j . 1} = ⟨ R_{i t}, R_{j t} ⟩ - ⟨ R_{i t}, R_{1 t} ⟩ {⟨ R_{1 t}, R_{1 t} ⟩}^{- 1} ⟨ R_{1 t}, R_{j t} ⟩

the solution to the RRR problem from solving the eigenvalue problem

| λ S_{22.1} - S_{20.1} S_{00.1}^{- 1} S_{02.1} | = 0,

(11)

with solutions

1 > {\hat{λ}}_{1} \geq \dots \geq {\hat{λ}}_{p} > 0

ordered with decreasing size and corresponding vectors

V = (v_{1}, \dots, v_{p})

. Then as usual the trace statistic of testing the model

H_{r}

with

rank (Φ) \leq r

,

r < p

, in the model

H_{p}

with

rank (Φ) \leq p

, is given as

Q_{r} = - 2 log Q (H_{r} | H_{p}) = - T \sum_{i = r + 1}^{p} log (1 - {\hat{λ}}_{i}) .

(12)

The optimizers for

α, β

are given by

\hat{β} = (v_{1}, \dots, v_{r}), \hat{α} = S_{02.1} \hat{β}, {\hat{Σ}}_{ϵ} = S_{00.1} - \hat{α} {\hat{α}}^{'} .

(13)

In the second step, given

α

and

β

known, we can obtain by multiplying (10) by

{\bar{α}}_{⊥}^{'}

that

{\bar{α}}_{⊥}^{'} R_{0 t} = {\bar{α}}_{⊥}^{'} Ψ ({\bar{β}}_{⊥} β_{⊥}^{'} + \bar{β} β^{'}) R_{1 t} + {\bar{α}}_{⊥}^{'} {\tilde{e}}_{t} = ζ η^{'} (β_{⊥}^{'} R_{1 t}) + C (β^{'} R_{1 t}) + {\bar{α}}_{⊥}^{'} {\tilde{e}}_{t} .

(14)

Note that

β^{'} R_{1 t}

is stationary. Thus concentrating out C and denoting the residuals as

R_{{\bar{α}}_{⊥} . β, t}

and

R_{β_{⊥} . β, t}

, respectively, we can define

S_{a b . β} : = ⟨ R_{a . β, t}, R_{b . β, t} ⟩

, for

a, b = {\bar{α}}_{⊥}

or

β_{⊥}

. Then the likelihood ratio test of the model

H_{r, s}

with

rank (ζ η^{'}) \leq s

,

s < p - r

in the model

H_{r}^{0}

with

rank ({\bar{α}}_{⊥}^{'} Ψ {\bar{β}}_{⊥}) = p - r

is given by

Q_{r, s} = - 2 log Q (H_{r, s} | H_{r}^{0}) = - T \sum_{i = s + 1}^{p - r} log (1 - {\hat{ρ}}_{i}) .

(15)

where

1 > {\hat{ρ}}_{1} \geq \dots \geq {\hat{ρ}}_{p - r} > 0

are the solutions of the eigenvalue problem

| ρ S_{β_{⊥} β_{⊥} . β} - S_{β_{⊥} {\bar{\underset{̲}{α}}}_{⊥} . β} S_{{\bar{\underset{̲}{α}}}_{⊥} {\bar{α}}_{⊥} . β}^{- 1} S_{{\bar{α}}_{⊥} β_{⊥} . β} | = 0,

(16)

and the corresponding eigenvectors are

W = (w_{1}, \dots, w_{p - r})

. Estimators of

ζ

and

η

are given by

\hat{η} = (w_{1}, \dots, w_{s}), \hat{ζ} = S_{{\bar{α}}_{⊥} β_{⊥} . β} \hat{η} .

(17)

For the 2SI2 procedure in this second step the first step estimates

\hat{α}

and

\hat{β}

are used in place of the unknown true quantities. Then we obtain the following analogon to the results in the finite order VAR framework (the proof is given in Appendix D):

Theorem 3.

Let the data be generated according to Assumption 1 and let the VAR order fulfill Assumption 2. Then the following asymptotic results hold:

(A) The asymptotic distribution of the likelihood ratio statistic

Q_{r}

under the null hypothesis

H_{r}

is given by

Q_{r} \overset{d}{\to} t r \{\int_{0}^{1} d W_{†} F_{†}^{'} {(\int_{0}^{1} F_{†} F_{†}^{'} d u)}^{- 1} \int_{0}^{1} F_{†} d W_{†}^{'}\} .

(18)

where

W_{†} = {(α_{⊥}^{'} Σ_{ϵ} α_{⊥})}^{- 1 / 2} α_{⊥}^{'} W, F_{a} (u) = [\begin{matrix} B_{2} (u) \\ \int_{0}^{u} B_{3} (v) d v \end{matrix}]

and

F_{†} (u) = F_{a} (u) - \int_{0}^{1} F_{a} B_{3}^{'} {(\int_{0}^{1} B_{3} B_{3}^{'})}^{- 1} B_{3} (u)

. This is identical to the distribution achieved in the finite VAR case.

(B) The asymptotic distribution of the likelihood ratio statistic

Q_{r, s}

under the null hypothesis

H_{r, s}

is given by

Q_{r, s} \overset{d}{\to} t r \{\int_{0}^{1} d W_{2}^{'} B_{3}^{'} {(\int_{0}^{1} B_{3} B_{3}^{'} d u)}^{- 1} \int_{0}^{1} B_{3} d W_{2}^{'}\} .

(19)

where

W_{2} (u) = {(α_{2}^{'} Σ_{ϵ} α_{2})}^{- 1 / 2} α_{2}^{'} W (u)

.

(C) The asymptotic distribution of the test statistic

S_{r, s} = Q_{r} + Q_{r, s}

under the null hypothesis

H_{r, s}

is given by

S_{r, s} \overset{d}{\to} t r \{\int_{0}^{1} d W_{†} F_{†}^{'} {(\int_{0}^{1} F_{†} F_{†}^{'} d u)}^{- 1} \int_{0}^{1} F_{†} d W_{†}^{'}\} + t r \{\int_{0}^{1} d W_{2} B_{3}^{'} {(\int_{0}^{1} B_{3} B_{3}^{'} d u)}^{- 1} \int_{0}^{1} B_{3} d W_{2}^{'}\} .

(20)

(D) Using suitable normalizations all estimators are consistent:

\hat{α} {(c_{α}^{'} \hat{α})}^{- 1} \overset{p}{\to} α, \hat{β} {(c_{β}^{'} \hat{β})}^{- 1} \overset{p}{\to} β, \hat{ζ} {(c_{ζ}^{'} \hat{ζ})}^{- 1} \overset{p}{\to} ζ, \hat{η} {(c_{η}^{'} \hat{η})}^{- 1} \overset{p}{\to} η, \hat{Ψ} \overset{p}{\to} Ψ, {\hat{Π}}_{j} \overset{p}{\to} Π_{j}

where for example

c_{α}^{'} α = I_{r}

.

(E) The asymptotic distributions of the coefficients to the nonstationary regressors are identical to the ones in the finite order VAR case stated in Paruolo (2000). The asymptotic distribution of the coefficients

{\hat{Π}}_{j}

are identical to the ones in Theorem 1.

The main message of the theorem is that the 2SI2 procedure shows the same asymptotic properties including the rank tests as in the finite order VAR case. As usual also restricting the coefficients for the non-stationary regressors does not influence the asymptotics for the coefficients corresponding to the stationary regressors.

Note that Paruolo (2000) shows that in the finite VAR case 2SI2 estimates have the same asymptotic distribution as pseudo maximum likelihood (pML) estimates maximizing the Gaussian likelihood. The first order conditions for the pML estimates of the coefficients to the non-stationary regressors provided in the first display on p. 548 in Paruolo (2000) depend on the data only via the matrices

S_{i j}

defined above. These matrices depend on the lag length of the VECM only via the concentration step. The proof of our Theorem 3 shows that these terms have the same asymptotic distributions for the finite order VAR and the long VAR. Theorem 4.3 of Paruolo (2000) shows that the asymptotic distribution of the coefficients due to stationary regressors does not depend on the distribution of the coefficients corresponding to the non-stationary regressors as long as they are estimated super-consistently. Thus our results imply that also in the long VAR case the asymptotic distribution of all estimates for the 2SI2 and the pML approach is identical.

5. Initial Guess for VARMA Estimation

One usage of long VAR approximations is as preliminary estimate for VARMA model estimation. Hannan and Kavalieris (1986) provide properties of such an approach in the stationary case, Lütkepohl and Claessen (1997) extend the procedure to the I(1) case. Here we extend this idea to the I(2) case.

The goal is to provide a consistent initial guess for the estimation of a VARMA model for I(2) processes. In this respect we assume the following data generating process:

Assumption 3 (VARMA dgp).

The process

{(y_{t})}_{t \in Z}

is generated as the solution to the state space equations

y_{t} = C x_{t} + ε_{t}, x_{t + 1} = A x_{t} + B ε_{t}

(21)

where

{(ε_{t})}_{t \in Z}

denotes white noise subject to the same assumptions as in Assumption 1.

Here

x_{t} \in R^{n}

is the unobserved state process. The system

(A, B, C)

is assumed to be minimal and in the canonical form of Bauer and Wagner (2012), that is

A = [\begin{matrix} I_{c} & I_{c} & 0 & 0 \\ 0 & I_{c} & 0 & 0 \\ 0 & 0 & I_{d} & 0 \\ 0 & 0 & 0 & A_{•} \end{matrix}], B = [\begin{matrix} B_{1} \\ B_{2} \\ B_{3} \\ B_{•} \end{matrix}], C = [\begin{matrix} C_{1} & C_{2} & C_{3} & C_{•} \end{matrix}],

where

| λ_{m a x} (A_{•}) | < 1

(the matrix

A_{•}

is stable),

C_{1}^{'} C_{1} = I_{c}, C_{3}^{'} C_{3} = I_{d}, C_{1}^{'} C_{3} = 0, C_{1}^{'} C_{2} = 0, C_{2}^{'} C_{3} = 0

. Furthermore, the system is strictly minimum-phase, that is

ρ_{0} = | λ_{m a x} (A - B C) | < 1

. Finally the matrix

\bar{A} = A - B C

is nonsingular.

At time

t = 0

the state

x_{0} = {[x_{0, u}^{'}, x_{•}^{'}]}^{'}, x_{0, u} \in R^{2 c + d},

is such that

x_{0, u}

is deterministic and

x_{0, •} = \sum_{j = 1}^{\infty} A_{•}^{j - 1} B_{•} ε_{- j}

denotes the stationary solution to the stable part of the system.

In this situation it follows that

{(y_{t})}_{t \in Z}

is an I(2) process in the definition of Bauer and Wagner (2012), that is its second difference is a stationary VARMA process. The integers c and d are connected to the integers

p_{1}, p_{2}, p_{3}

via

c = p_{3}, d = p_{2}

such that

p_{1} = p - c - d

. It can furthermore be shown that a process generated using Assumption 3 possesses a VAR(h) approximation:

y_{t} + \sum_{j = 1}^{h} A_{j} y_{t - j} = ε_{t} + C {(A - B C)}^{h} x_{t - h}

where

A_{j} = - C {(A - B C)}^{j - 1} B, ∥ A_{j} ∥ \leq μ ρ^{j}

(

0 \leq ρ_{0} < ρ < 1

) converges to zero exponentially fast for

j \to \infty

due to the strict minimum-phase condition. Letting

h \to \infty

then implies the existence of a VAR(∞) representation. It follows that for such systems

A (z)

converges absolutely for

| z | < ρ^{- 1}

where

1 < ρ^{- 1}

.

From the autoregressive representation the VECM representation can be obtained:

a (z) = I_{p} + \sum_{j = 1}^{\infty} A_{j} z^{j} = I_{p} - \sum_{j = 1}^{\infty} C {\bar{A}}^{j - 1} B z^{j} = {(1 - z)}^{2} I_{p} - Φ z - Ψ z (1 - z) - {(1 - z)}^{2} \sum_{j = 1}^{\infty} Π_{j} z^{j}

where

\bar{A} = A - B C

such that

Δ^{2} y_{t} = Φ y_{t - 1} + Ψ Δ y_{t - 1} + \sum_{j = 1}^{\infty} Π_{j} Δ^{2} y_{t - j} + ε_{t} .

A comparison of power series coefficients provides the identities:

\begin{array}{l} Φ & = - I_{p} + C {(I - \bar{A})}^{- 1} B, \\ Ψ & = - I_{p} - C {(I - \bar{A})}^{- 2} \bar{A} B, \\ Π_{j} & = [C {\bar{A}}^{2} {(I - \bar{A})}^{- 2}] {\bar{A}}^{j - 1} B = D {\bar{A}}^{j - 1} B, j = 1, 2, \dots \end{array}

It follows that the coefficients

Π_{j}, j = 1, 2, \dots

form the impulse response of a rational transfer function of order smaller or equal to n. If

\bar{A}

is nonsingular then the order equals n and the system

(\bar{A}, B, D)

is minimal. Furthermore, it follows that for arbitrary

Φ

and

Ψ

the transfer function

a (z) = {(1 - z)}^{2} I_{p} - Φ z - Ψ z (1 - z) - {(1 - z)}^{2} z D {(I - z \bar{A})}^{- 1} B

is a rational transfer function with the additional property that

a (1) = - Φ = - α β^{'}, {\bar{α}}_{⊥}^{'} \dot{a} (1) {\bar{β}}_{⊥} = {\bar{α}}_{⊥}^{'} (- Φ + Ψ) {\bar{β}}_{⊥} = - {\bar{α}}_{⊥}^{'} C {(I - \bar{A})}^{- 2} B {\bar{β}}_{⊥} = ζ η^{'} .

Consequently

Φ

and

Ψ

determine the integration properties of processes generated using

a (z)

.

Conversely whenever the constraints

- I_{p} + C {(I - \bar{A})}^{- 1} B = α β^{'}, - {\bar{α}}_{⊥}^{'} C {(I - \bar{A})}^{- 2} B {\bar{β}}_{⊥} = ζ η^{'}

hold the corresponding triple

(A, B, C)

corresponds to an I(2) process (if the eigenvalues of A are in the closed unit disc). Defining

C_{*} = {\bar{α}}^{'} C, C_{†} = {\bar{α}}_{⊥}^{'} C

we obtain

- {\bar{α}}^{'} + C_{*} {(I - \bar{A})}^{- 1} B = β^{'}, - {\bar{α}}_{⊥}^{'} + C_{†} {(I - \bar{A})}^{- 1} B = 0, - C_{†} {(I - \bar{A})}^{- 2} B {\bar{β}}_{⊥} = ζ η^{'} .

(22)

The third equation does not have a solution for fixed

B {\bar{β}}_{⊥}, ζ, η

, if the row space of

B {\bar{β}}_{⊥}

does not contain the space spanned by the rows of

η^{'}

. In this case row-wise projection of

η^{'}

onto the space spanned by the rows of

B {\bar{β}}_{⊥}

allows for (not necessarily unique) solutions in

C_{†}

. In the limit no projection is needed. Consequently for large enough T the projected matrix will have full row rank. The second equation then determines

{\bar{α}}_{⊥}

which in turn determines

\bar{α}

up to the choice of the basis such that

{\bar{α}}^{'} = T_{C} {\bar{α_{o}}}^{'}

for some full row rank matrix

{\bar{α_{o}}}^{'} \in R^{r \times p}, {\bar{α_{o}}}^{'} {\bar{α}}_{⊥} = 0

. The first equation then can be rewritten as

[T_{C}, C_{*}] \underset{R_{1}}{\underset{︸}{[\begin{matrix} - {\bar{α_{o}}}^{'} \\ {(I - \bar{A})}^{- 1} B \end{matrix}]}} = β^{'} .

The second equation shows that the row space of

{(I - \bar{A})}^{- 1} B

contains the row space of

{\bar{α}}_{⊥}^{'}

. Thus the matrix

R_{1}

has full row rank. It follows that this equation has solutions.

Having obtained a solution for

C_{*}, C_{†}, \bar{α}, {\bar{α}}_{⊥}

then C is obtained from

C = [\begin{matrix} α & α_{⊥} \end{matrix}] [\begin{matrix} C_{*} \\ C_{†} \end{matrix}] .

A unique solution then can be obtained from adding the restrictions

Π_{j} = C {(I - \bar{A})}^{- 2} {\bar{A}}^{j + 1} B, j = 1, 2, \dots, 2 n

which for the estimates are to be solved in a least squares sense among all solutions to equations (22).

It then follows that for the true matrices

Φ, Ψ, Π_{j}

the only solution for given

\bar{A}, B

consists in the corresponding true C. These facts therefore can be used in order to develop an initial guess for subsequent pseudo likelihood maximization using the parameterization of I(2) processes in state space representation: Given the integer valued parameters

n, c

and d:

Obtain a long VAR approximation $\hat{Φ}, \hat{Ψ}, {\hat{Π}}_{j}, j = 1, 2, \dots$ , including $\hat{Φ} = \hat{α} {\hat{β}}^{'}$ and $\hat{ζ} {\hat{η}}^{'} = {\hat{{\bar{α}}_{⊥}}}^{'} \hat{Ψ} \hat{{\bar{β}}_{⊥}}$ using the 2SI2 approach.
Choose the integer $f \geq n$ . Use the algorithm described in Appendix F to obtain estimates $(\hat{\bar{A}}, \hat{B}, \hat{D})$ realizing the impulse response ${\hat{Π}}_{j}, j = 1, \dots, 2 f$ from the Hankel matrix with f block columns and f block rows.
Project rows of ${\hat{η}}^{'}$ onto the space spanned by the rows of $\hat{B} \hat{{\bar{β}}_{⊥}}$ to obtain ${\tilde{η}}^{'}$ .
Obtain a unique solution $\hat{C}$ solving (22) such that the matrices ${\tilde{Π}}_{j} = \hat{C} {(I - \hat{\bar{A}})}^{- 2} {\hat{\bar{A}}}^{j + 1} \hat{B}, j = 1, 2, \dots, 2 n$ have minimal Euclidean distance to ${\hat{Π}}_{j}, j = 1, 2, \dots, 2 n$ .
Transform the corresponding system $(\hat{\bar{A}} + \hat{B} \hat{C}, \hat{B}, \hat{C})$ to the canonical form of Bauer and Wagner (2012) to obtain the estimate $(\tilde{A}, \tilde{B}, \tilde{C})$ .

The algorithm obtains a minimal state space system of order n in the canonical form for I(2) processes given in Bauer and Wagner (2012) and hence can be used as an initial guess for subsequent pseudo-likelihood optimization in the set

M_{n} (r, s)

of all order n rational transfer functions corresponding to I(2) processes with state space unit root structure

((0, (c, c + d)))

.

Theorem 4 (Consistent initial guess).

Let

{(y_{t})}_{t \in Z}

denote a process generated using the system

(A_{0}, B_{0}, C_{0})

according to Assumption 3 and let the system

(\tilde{A}, \tilde{B}, \tilde{C})

be estimated based on the long VAR approximation with lag order chosen according to Assumption 2. Then

(\tilde{A}, \tilde{B}, \tilde{C})

is a weakly consistent estimator of the data generating system

(A_{0}, B_{0}, C_{0})

in the sense that

\tilde{C} {\tilde{A}}^{j} \tilde{B} \overset{p}{\to} C_{0} A_{0}^{j} B_{0}, j = 0, 1, \dots

and hence the corresponding transfer functions converge in pointwise topology.

The proof of this theorem can be found in Appendix E.

6. Conclusions

In this paper the theory on long VAR approximation of general linear dynamical processes is extended to the case of I(2) processes. We find that we need slightly narrower upper and lower bounds in the approximations. The tighter bounds are not needed for all results and appear not very restrictive for applications.

The main results are completely analogous to the I(1) case: The asymptotics in many respects is identical to the finite order VAR case. Asymptotic distributions for the coefficients to non- stationary variables are the same as in the finite order VAR case. This holds true both for unrestricted OLS estimates as well as the 2SI2 approach in the Johansen framework. Tests on cointegrating ranks show identical asymptotic distributions under the null as in the finite order VAR case and hence do not require other tables. In this respect the main conclusion is that the usual procedure of estimating the lag order in the first step and then applying the Johansen procedure for estimated lag order is justified also for processes generated from a VAR(∞) that is approximated with a choice of the lag order lying within the prescribed bounds.

Additionally in the VARMA case the long VAR approximation can be used in order to derive consistent initial guesses that can be used in subsequent pseudo likelihood estimation.

Thus the paper provides both a full extension of results that have been achieved in the I(1) case as well as a useful starting point for subsequent VARMA modeling which might be preferable in situations which require a high VAR order or show a large number of variables to be modeled, a situation where VARMA models can be more parsimonious than VAR models.

Author Contributions

The two authors of the paper have contributed equally, via joint efforts, regarding both ideas, research, and writing. Conceptualization, Y.L. and D.B.; methodology, Y.L. and D.B.; software, not applicable; validation, not applicable; formal analysis, Y.L. and D.B.; investigation, Y.L. and D.B.; resources, not applicable; writing–original draft preparation, Y.L. and D.B.; writing–review and editing, Y.L. and D.B.; visualization, not applicable; supervision, not applicable; project administration, D.B.; funding acquisition, D.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation —Projektnummer 276051388) which is gratefully acknowledged. We acknowledge support for the publication costs by the Deutsche Forschungsgemeinschaft and the Open Access Publication Fund of Bielefeld University.

Acknowledgments

The reviewers and in particular the two guest editors provided significant comments that helped in improving the paper, which is gratefully acknowledged.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Preliminaries

The theory in this paper follows closely the arguments in Lewis and Reinsel (1985) and its extension to the I(1) case in Saikkonen and Lütkepohl (1996). To this end consider the finite order VECM approximation:

Δ^{2} y_{t} = Φ y_{t - 1} + Ψ Δ y_{t - 1} + \sum_{j = 1}^{h} Π_{j} Δ^{2} y_{t - j} + e_{t} .

(A1)

The properties of the various estimators heavily use the following rewriting of the approximation using the triangular representation of

y_{t}

:

\begin{array}{l} Δ^{2} y_{t} & = [Φ_{1}, Φ_{2}, Φ_{3}] [\begin{matrix} A Δ y_{3, t - 1} + u_{1, t - 1} \\ y_{2, t - 1} \\ y_{3, t - 1} \end{matrix}] + [Ψ_{1}, Ψ_{2}, Ψ_{3}] [\begin{matrix} A u_{3, t - 1} + Δ u_{1, t - 1} \\ u_{2, t - 1} \\ Δ y_{3, t - 1} \end{matrix}] \\ + \sum_{j = 1}^{h} [Π_{j, 1}, Π_{j, 2}, Π_{j, 3}] [\begin{matrix} A Δ u_{3, t - j} + Δ^{2} u_{1, t - j} \\ Δ u_{2, t - j} \\ u_{3, t - j} \end{matrix}] + e_{t} \\ = Φ_{2} y_{2, t - 1} + Φ_{3} y_{3, t - 1} + Θ Δ y_{3, t - 1} + \sum_{j = 1}^{h} Ξ_{j} u_{t - j} + [Ξ_{h + 1, 1}, Ξ_{h + 1, 2}] [\begin{matrix} u_{1, t - h - 1} \\ u_{2, t - h - 1} \end{matrix}] + Ξ_{h + 2, 1} {\tilde{u}}_{1, t - h - 2} + e_{t}, \end{array}

(A2)

where

{\tilde{u}}_{1, t - h - 2} : = u_{1, t - h - 2} - A u_{3, t - h - 1}

and

Φ_{2} = Φ_{3} = 0

,

Θ = Φ_{1} A + Ψ_{3} = 0

, and

\begin{matrix} Ξ_{1} & = & [Φ_{1} + Ψ_{1} + Π_{1, 1}, Ψ_{2} + Π_{1, 2}, (Ψ_{1} + Π_{1, 1}) A + Π_{1, 3}], \\ Ξ_{2} & = & [- Ψ_{1} + Π_{2, 1} - 2 Π_{1, 1}, Π_{2, 2} - Π_{1, 2}, (Π_{2, 1} - Π_{1, 1}) A + Π_{2, 3}], \\ Ξ_{j} & = & [Π_{j, 1} - 2 Π_{j - 1, 1} + Π_{j - 2, 1}, Π_{j, 2} - Π_{j - 1, 2}, (Π_{j, 1} - Π_{j - 1, 1}) A + Π_{j, 3}], j = 3, \dots, h, \\ Ξ_{h + 1, 1} & = & - 2 Π_{h, 1} + Π_{h - 1, 1}, Ξ_{h + 1, 2} = - Π_{h, 2}, Ξ_{h + 2, 1} = Π_{h, 1} . \end{matrix}

Furthermore, we can see that

\sum_{j = 1}^{h + 2} Ξ_{j, 1} = Φ_{1}

,

\sum_{j = 1}^{h + 1} Ξ_{j, 2} = Ψ_{2}

, and

\sum_{j = 1}^{h} Ξ_{j, 3} = Ψ_{1} A + \sum_{j = 1}^{h} Π_{j, 3}

. Finally

Ψ_{1} = - \sum_{j = 2}^{h + 2} (j - 1) Ξ_{j, 1}

.

Note that in the reparametrization (A2), the I(1) components,

y_{c, t} : = {(y_{2, t}^{'}, Δ y_{3, t}^{'})}^{'}

, as well as the I(2) components,

y_{3, t - 1}

, are isolated from the stationary ones,

u_{t - j}

, and have coefficients equal to zero, which facilitates the derivation of the asymptotic properties.

In the reparameterized setting define 3

\underset{̲}{Ξ} : = [Ξ_{1}, \dots, Ξ_{h}, Ξ_{h + 1, 1}, Ξ_{h + 1, 2}, Ξ_{h + 2, 1}], p \times (p h + 2 p_{1} + p_{2})

,

U_{t} : = {[u_{t - 1}^{'}, \dots, u_{t - h}^{'}, u_{1, t - h - 1}^{'}, u_{2, t - h - 1}^{'}, {\tilde{u}}_{1, t - h - 2}^{'}]}^{'}, (p h + 2 p_{1} + p_{2}) \times 1

,

\underset{̲}{Λ} : = [\underset{̲}{Ξ}, Φ_{2}, Θ, Φ_{3}] = [\underset{̲}{Ξ}, 0], p \times p (h + 2),

W_{t} : = {[U_{t}^{'}, y_{c, t - 1}^{'}, y_{3, t - 1}^{'}]}^{'}, p (h + 2) \times 1 .

we have

Δ^{2} y_{t} = \underset{̲}{Λ} W_{t} + e_{t},

(A3)

and correspondingly,

Δ^{2} y_{t} = \hat{Λ} W_{t} + {\tilde{e}}_{t}

where

\hat{Λ} = [\hat{Ξ}, {\hat{Φ}}_{2}, \hat{Θ}, {\hat{Φ}}_{3}] = ⟨ Δ^{2} y_{t}, W_{t} ⟩ {⟨ W_{t}, W_{t} ⟩}^{- 1}

is the OLS estimator of

\underset{̲}{Λ}

. Here

⟨ X_{t}, Z_{t} ⟩ : = \sum_{t = h + 3}^{T} X_{t} Z_{t}^{'}

.

Note that

W_{t}

and the regressors in (A1) are in one-one correspondence. In the original Equation (A1) beside the nonstationary regressors

y_{c, t - 1}

and

y_{3, t - 1}

the regressor vector

{\tilde{X}}_{t} = {[y_{1, t - 1}^{'}, Δ y_{1, t - 1}^{'}, u_{2, t - 1}^{'}, Δ^{2} y_{t - 1}^{'}, \dots, Δ^{2} y_{t - h}^{'}]}^{'} \in R^{2 p_{1} + p_{2} + p h}

occurs which cointegrates with

Δ y_{3, t - 1}

such that

X_{t} = {\tilde{X}}_{t} - {[A^{'}, 0]}^{'} Δ y_{3, t - 1} = T_{h} U_{t}

(A4)

is stationary. Here the nonsingular matrix

T_{h} \in R^{(p h + 2 p_{1} + p_{2}) \times (p h + 2 p_{1} + p_{2})}

is defined as:

[\begin{array}{c} I_{p_{1}} \\ I_{p_{1}} & A & - I_{p_{1}} \\ I_{p_{2}} \\ I_{p_{1}} & A & - 2 I_{p_{1}} & - A & I_{p_{1}} \\ I_{p_{2}} & - I_{p_{2}} \\ I_{p_{3}} \\ I_{p_{1}} & A & - 2 I_{p_{1}} & - A & I_{p_{1}} \\ I_{p_{2}} & - I_{p_{2}} \\ I_{p_{3}} \\ ⋱ & ⋱ & ⋱ \\ I_{p_{1}} & A & - 2 I_{p_{1}} & - A & I_{p_{1}} \\ I_{p_{2}} & - I_{p_{2}} \\ I_{p_{3}} \\ I_{p_{1}} & A & - 2 I_{p_{1}} & I_{p_{1}} \\ I_{p_{2}} & - I_{p_{2}} \\ I_{p_{3}} \end{array}]

Let

\underset{̲}{Π} : = [Φ_{1}, Ψ_{1}, Ψ_{2}, : Π_{1} : Π_{2} : \dots : Π_{h}]

, so that we have

\underset{̲}{Ξ} = \underset{̲}{Π} T_{h} .

(A5)

It can be verified that

T_{h}

is invertible. The asymptotic properties of

\hat{Λ} - \underset{̲}{Λ}

are clarified in the next lemma:

Lemma A1.

Under the assumptions of Theorem 1 using

N = T - h - 2

as the effective sample size

\begin{array}{l} N^{\frac{1}{2}} (\hat{Ξ} - \underset{̲}{Ξ}) & = N^{\frac{1}{2}} ⟨ ε_{t}, U_{t} ⟩ {(E U_{t} U_{t}^{'})}^{- 1} + o_{P} (h^{\frac{1}{2}}), \\ [N {\hat{Φ}}_{2}, N \hat{Θ}, N^{2} {\hat{Φ}}_{3}] & \Rightarrow g (1) [\begin{matrix} \int_{0}^{1} d B B_{c}^{'} & \int_{0}^{1} d B H_{3}^{'} \end{matrix}] {[\begin{matrix} \int_{0}^{1} B_{c} B_{c}^{'} & \int_{0}^{1} B_{c} H_{3}^{'} \\ \int_{0}^{1} H_{3} B_{c}^{'} & \int_{0}^{1} H_{3} H_{3}^{'} \end{matrix}]}^{- 1} \end{array}

where

H_{3} (u) = \int_{0}^{u} B_{3} (s) d s

.

Proof.

The proof essentially shows that the coefficients corresponding to the stationary regressors and the ones corresponding to the integrated regressors asymptotically can be dealt with separately. Let

D_{T} : = diag [N^{- \frac{1}{2}} I_{p h + 2 p_{1} + p_{2}}, N^{- 1} I_{p_{2} + p_{3}}, N^{- 2} I_{p_{3}}]

. Note that

N^{\frac{1}{2}} (\hat{Ξ} - \underset{̲}{Ξ})

,

N [{\hat{Φ}}_{2}, \hat{Θ}]

, and

N^{2} {\hat{Φ}}_{3}

are the 1st, 2nd and 3rd column blocks of

(\hat{Λ} - \underset{̲}{Λ}) D_{T}^{- 1}

, respectively. Moreover, we have

(\hat{Λ} - \underset{̲}{Λ}) D_{T}^{- 1} = ⟨ e_{t}, W_{t} ⟩ D_{T} {(D_{T} ⟨ W_{t}, W_{t} ⟩ D_{T})}^{- 1} .

Let

\hat{R} : = D_{T} ⟨ W_{t}, W_{t} ⟩ D_{T},

and define

R : = diag [Γ_{u}, R_{2}],

where

Γ_{u} = E [U_{t} U_{t}^{'}]

, and

R_{2} : = [\begin{matrix} N^{- 2} ⟨ y_{c, t - 1}, y_{c, t - 1} ⟩ & N^{- 3} ⟨ y_{c, t - 1}, y_{3, t - 1} ⟩ \\ N^{- 3} ⟨ y_{3, t - 1}, y_{c, t - 1} ⟩ & N^{- 4} ⟨ y_{3, t - 1}, y_{3, t - 1} ⟩ \end{matrix}] .

Note that each block of the matrix

R_{2}

is of order

O_{p} (1)

, and moreover, both

R_{2}

and its limit are almost surely invertible, as there is no cointegration between

y_{c, t - 1}

and

y_{3, t - 1}

(see Lemma 3.1.1 in Chan and Wei (1988), and Sims et al. (1990)). Note that

\begin{array}{l} (\hat{Λ} - \underset{̲}{Λ}) D_{T}^{- 1} - ⟨ ε_{t}, W_{t} ⟩ D_{T} R^{- 1} & = \underset{= : E_{1}}{\underset{︸}{⟨ e_{1 t}, W_{t} ⟩ D_{T} R^{- 1}}} + \underset{= : E_{2}}{\underset{︸}{⟨ e_{1 t}, W_{t} ⟩ D_{T} ({\hat{R}}^{- 1} - R^{- 1})}} + \underset{= : E_{3}}{\underset{︸}{⟨ ε_{t}, W_{t} ⟩ D_{T} ({\hat{R}}^{- 1} - R^{- 1})}} . \end{array}

Here

⟨ ε_{t}, W_{t} ⟩ D_{T} R^{- 1}

has the limits stated in the lemma since:

\begin{array}{l} N^{- 1} ⟨ ε_{t}, y_{c, t - 1} ⟩ \Rightarrow g (1) \int_{0}^{1} d B B_{c}^{'}, N^{- 2} ⟨ ε_{t}, y_{3, t - 1} ⟩ & \Rightarrow g (1) \int_{0}^{1} d B H_{3}^{'}, \\ [\begin{matrix} N^{- 2} ⟨ y_{c, t - 1}, y_{c, t - 1} ⟩ & N^{- 3} ⟨ y_{c, t - 1}, y_{3, t - 1} ⟩ \\ N^{- 3} ⟨ y_{3, t - 1}, y_{c, t - 1} ⟩ & N^{- 4} ⟨ y_{3, t - 1}, y_{3, t - 1} ⟩ \end{matrix}] & \Rightarrow [\begin{matrix} \int_{0}^{1} B_{c} B_{c}^{'} & \int_{0}^{1} B_{c} H_{3}^{'} \\ \int_{0}^{1} H_{3} B_{c}^{'} & \int_{0}^{1} H_{3} H_{3}^{'} \end{matrix}] . \end{array}

The lemma therefore holds, if

E_{1} = [o_{P} (h^{1 / 2}), o_{P} (1), o_{P} (1)], E_{2} = o_{P} (1), E_{3} = o_{P} (1)

can be shown (where the blocks in

E_{1}

correspond to the partitioning of

W_{t}

into stationary, I(1) and I(2) components). For this it is sufficient to show:

(I): $∥ {\hat{R}}^{- 1} - R^{- 1} ∥_{1} = O_{P} (h / N^{\frac{1}{2}})$
(II): $∥ ⟨ e_{1 t}, W_{t} ⟩ D_{T} ∥ = o_{P} (h^{1 / 2})$ where $N^{- 1} ⟨ e_{1 t}, y_{c, t - 1} ⟩ = o_{P} (1)$ and $N^{- 2} ⟨ e_{1 t}, y_{3, t - 1} ⟩ = o_{P} (1)$
(III): $∥ ⟨ ε_{t}, W_{t} ⟩ D_{T} ∥ = O_{P} (h^{1 / 2})$ .

Here

{∥ . ∥}_{1}

denotes the spectral norm of a matrix while

∥ . ∥

denotes the Frobenius norm.

(I) To see

∥ {\hat{R}}^{- 1} - R^{- 1} ∥_{1} = O_{p} (h / N^{\frac{1}{2}})

, according to Lewis and Reinsel (1985), it is sufficient to show

∥ \hat{R} {- R ∥}_{1} = O_{p} (h / N^{\frac{1}{2}}), ∥ R^{- 1} ∥_{1} = O_{p} (1) .

Note that

\begin{array}{l} \hat{R} - R & = [\begin{matrix} N^{- 1} ⟨ U_{t}, U_{t} ⟩ - Γ_{u} & N^{- \frac{3}{2}} ⟨ U_{t}, y_{c, t - 1} ⟩ & N^{- \frac{5}{2}} ⟨ U_{t}, y_{3, t - 1} ⟩ \\ N^{- \frac{3}{2}} ⟨ y_{c, t - 1}, U_{t} ⟩ & 0 & 0 \\ N^{- \frac{5}{2}} ⟨ y_{3, t - 1}, U_{t} ⟩ & 0 & 0 \end{matrix}] = : [\begin{matrix} \hat{Q} & {\hat{P}}_{12} & {\hat{P}}_{13} \\ {\hat{P}}_{21} & 0 & 0 \\ {\hat{P}}_{31} & 0 & 0 \end{matrix}], \end{array}

then we have

E ∥ \hat{R} {- R ∥}_{1}^{2} \leq E ∥ \hat{R} {- R ∥}^{2} = E ∥ \hat{Q} ∥^{2} + 2 (E ∥ {\hat{P}}_{12} ∥^{2} + E ∥ {\hat{P}}_{13} ∥^{2})

.

Now let

U_{t}^{o} : = {[u_{t - 1}^{'}, \dots, u_{t - h - 2}^{'}]}^{'}

, then there exists a transformation

T^{u}

of full row rank, such that

U_{t} = T^{u} U_{t}^{o}

, where

T^{u}

is a

(p h + 2 p_{1} + p_{2}) \times p (h + 2)

matrix:

{\overset{︷}{[\begin{matrix} u_{t - 1} \\ ⋮ \\ u_{t - h} \\ u_{1, t - h - 1} \\ u_{2, t - h - 1} \\ {\tilde{u}}_{1, t - h - 2} \end{matrix}]}}_{(p h + 2 p_{1} + p_{2}) \times 1}^{U_{t}} = {\overset{︷}{[\begin{matrix} I_{p h + p_{1} + p_{2}} & 0 & 0 & 0 \\ 0 & - A & I_{p_{1}} & 0 \end{matrix}]}}_{(p h + 2 p_{1} + p_{2}) \times p (h + 2)}^{T^{u}} {\overset{︷}{[\begin{matrix} u_{t - 1} \\ ⋮ \\ u_{t - h} \\ u_{1, t - h - 1} \\ u_{2, t - h - 1} \\ u_{3, t - h - 1} \\ u_{1, t - h - 2} \\ u_{2, t - h - 2} \\ u_{3, t - h - 2} \end{matrix}]}}_{p (h + 2) \times 1}^{U_{t}^{o}} .

Then, we have

\hat{Q} = T^{u} {\hat{Q}}^{o} T^{u^{'}}

, where

{\hat{Q}}^{o} = \frac{1}{N} ⟨ U_{t}^{o}, U_{t}^{o} ⟩ - E [U_{t}^{o} U_{t}^{o^{'}}]

; moreover,

{\hat{P}}_{1 i} = T^{u} {\hat{P}}_{1 i}^{o}

for

i = 2, 3

, where

{\hat{P}}_{12}^{o} = N^{- \frac{3}{2}} ⟨ U_{t}^{o}, y_{c, t - 1} ⟩, {\hat{P}}_{13}^{o} = N^{- \frac{5}{2}} ⟨ U_{t}^{o}, y_{3, t - 1} ⟩

. Since

∥ T^{u} ∥_{1} = O (1)

,

\hat{Q}

and

{\hat{P}}_{1 i}

have the same rate of convergence as

{\hat{Q}}^{o}

and

{\hat{P}}_{1 i}^{o}

, respectively. From Saikkonen (1991) Lemma A.2. we know

E ∥ {\hat{Q}}^{o} ∥^{2} = O (h^{2} / N)

and

E ∥ {\hat{P}}_{12}^{o} ∥^{2} = O (h / N)

by direct calculation.

For

{\hat{P}}_{13}^{o}

note that

E ∥ y_{3, t - 1} ∥^{2} = E ∥ \sum_{j = 1}^{t - 1} \sum_{i = 1}^{j} u_{3, i} ∥^{2} = E ∥ \sum_{i = 1}^{t - 1} i u_{3, t - 1 - i} ∥^{2} = O (t^{3}) .

Then analogous calculation as for

{\hat{P}}_{12}^{o}

show that

E ∥ {\hat{P}}_{13}^{o} ∥^{2} = O (h / N)

. Concluding we obtain

E ∥ \hat{R} {- R ∥}_{1}^{2} = O (h^{2} / N)

such that

∥ \hat{R} {- R ∥}_{1} = O_{P} (h / N^{\frac{1}{2}})

.

To show

∥ R^{- 1} ∥_{1} = O_{P} (1)

note that

R^{- 1} = diag {Γ_{u}^{- 1}, R_{2}^{- 1}}

where

∥ Γ_{u}^{- 1} ∥_{1} = O (1)

(see Lewis and Reinsel (1985), p. 397) and

∥ R_{2}^{- 1} ∥_{1} = O_{P} (1)

, since

R_{2}

is a.s. invertible and converges in distribution to an almost surely nonsingular random matrix.

(II) With respect to

∥ ⟨ e_{1 t}, W_{t} ⟩ D_{T} ∥ = o_{P} (h^{1 / 2})

note that

∥⟨ e_{1 t}, W_{t} ⟩ D_{T}∥ \leq ∥N^{- \frac{1}{2}} ⟨ e_{1 t}, U_{t} ⟩∥ + ∥N^{- 1} ⟨ e_{1 t}, y_{c, t - 1} ⟩∥ + ∥N^{- 2} ⟨ e_{1 t}, y_{3, t - 1} ⟩∥ .

From Saikkonen (1991) Lemma A.5 we have

∥N^{- \frac{1}{2}} ⟨ e_{1 t}, U_{t} ⟩∥ = o_{P} (h^{\frac{1}{2}})

, and

∥ N^{- 1} ⟨ e_{1 t}, y_{c, t - 1} ⟩ ∥ = o_{P} (1)

. Then

E ∥ y_{3, t - 1} ∥^{2} = O (t^{3})

and

E ∥ e_{1 t} ∥^{2} = o (N^{- 1})

imply

E ∥ N^{- 2} ⟨ e_{1 t}, y_{3, t - 1} ⟩ ∥ \leq N^{- 2} \sum_{t = h + 3}^{T} {(E ∥ e_{1 t} ∥^{2} E {∥ y_{3, t - 1} ∥}^{2})}^{\frac{1}{2}} = o (N^{- 2} N N^{- 1 / 2} N^{3 / 2}) = o (1) .

(III) To show

∥ ⟨ ε_{t}, W_{t} ⟩ D_{T} ∥ = O_{P} (h^{1 / 2})

note that

N^{- \frac{1}{2}} ⟨ ε_{t}, U_{t} ⟩ = O_{P} (h^{1 / 2}), N^{- 1} ⟨ ε_{t}, y_{c, t - 1} ⟩ = O_{P} (1)

according to (A.7) of Saikkonen (1992). Moreover

N^{- 2} \sum_{t = h + 3}^{T} ε_{t} y_{3, t - 1}^{'} \Rightarrow g (1) \int_{0}^{1} d B H_{3}^{'}

implies

N^{- 2} ⟨ ε_{t}, y_{3, t - 1} ⟩ = O_{P} (1)

. □

Note that for the lemma to hold we only need

h^{3} / N \to 0

and

N^{1 / 2} \sum_{j = h + 1}^{\infty} ∥ Π_{j} ∥ = o (1)

.

Appendix B. Proof of Theorem 1

Appendix B.1. (A) Consistency

(i) Lemma A1 implies

{\hat{Φ}}_{2} \to 0 = Φ_{2}, {\hat{Φ}}_{3} \to 0 = Φ_{3}

. Furthermore, the reparameterization implies

Φ_{1} = \sum_{j = 1}^{h + 2} {\underset{̲}{Ξ}}_{j 1}

and thus

{\hat{Φ}}_{1} = \sum_{j = 1}^{h + 2} {\hat{Ξ}}_{j, 1}

leading to

\begin{array}{l} ∥ {\hat{Φ}}_{1} - {\underset{̲}{Φ}}_{1} ∥ & \leq ∥ \sum_{j = 1}^{h + 2} {\hat{Ξ}}_{j, 1} - \sum_{j = 1}^{h + 2} {\underset{̲}{Ξ}}_{j, 1} ∥ \\ \leq \sum_{j = 1}^{h + 2} ∥ {\hat{Ξ}}_{j, 1} - {\underset{̲}{Ξ}}_{j, 1} ∥ \leq ∥ \hat{Ξ} - \underset{̲}{Ξ} ∥ = O_{P} (h^{3 / 2} / N^{1 / 2}) \end{array}

where the last inequality holds due to

⟨ ε_{t}, u_{t - j} ⟩ = O_{P} (N^{1 / 2})

in combination with Lemma A1.

(ii) Note that

{\hat{Σ}}_{ϵ} = N^{- 1} ⟨ Δ^{2} y_{t} - \hat{Λ} W_{t}, Δ^{2} y_{t} - \hat{Λ} W_{t} ⟩ = N^{- 1} ⟨ e_{t} + (\underset{̲}{Λ} - \hat{Λ}) W_{t}, e_{t} + (\underset{̲}{Λ} - \hat{Λ}) W_{t} ⟩ .

Now

⟨ (\underset{̲}{Λ} - \hat{Λ}) W_{t}, (\underset{̲}{Λ} - \hat{Λ}) W_{t} ⟩ = (\underset{̲}{Λ} - \hat{Λ}) D_{T}^{- 1} D_{T} ⟨ W_{t}, W_{t} ⟩ D_{T} D_{T}^{- 1} {(\underset{̲}{Λ} - \hat{Λ})}^{'}

where

\hat{R} = D_{T} ⟨ W_{t}, W_{t} ⟩ D_{T}

such that

∥ \hat{R} ∥_{1} = O_{P} (1)

and

∥ (\underset{̲}{Λ} - \hat{Λ}) D_{T}^{- 1} ∥ = O_{P} (h^{1 / 2})

. Consequently

N^{- 1} ⟨ (\underset{̲}{Λ} - \hat{Λ}) W_{t}, (\underset{̲}{Λ} - \hat{Λ}) W_{t} ⟩ = O_{P} (h / N) \to 0 .

Next, from the definition of

e_{t}

, we can show that

N^{- 1} ⟨ ε_{t} + e_{1 t}, ε_{t} + e_{1 t} ⟩ = N^{- 1} ⟨ ε_{t}, ε_{t} ⟩ + o_{P} (1) = Σ_{ϵ} + o_{P} (1),

where the last equality follows the law of large numbers and the first equality is implied by the fact that

∥ e_{1 t} ∥^{2} = o_{P} (T^{- 1})

and

∥ ε_{t} ∥^{2} = O_{P} (1)

.

(iii) From (i) and (ii),

{\hat{Ω}}_{1 . c} = {({\hat{Φ}}_{1}^{'} {\hat{Σ}}_{ϵ}^{- 1} {\hat{Φ}}_{1})}^{- 1} = {(Φ_{1}^{'} Σ_{ϵ}^{- 1} Φ_{1})}^{- 1} + o_{P} (1) = Ω_{1 . c} + o_{P} (1)

directly follows.

(iv) With respect to

\hat{Ψ}

recall that

Ψ_{1} = - \sum_{j = 2}^{h + 2} (j - 1) Ξ_{j, 1}, Ψ_{2} = \sum_{j = 1}^{h + 1} Ξ_{j, 2} .

Then Lemma A1 shows that each entry of

\hat{Ξ} - \underset{̲}{Ξ}

is of order

O_{P} (h^{1 / 2} / N^{1 / 2})

. Then

∥ {\hat{Ψ}}_{1} - Ψ_{1} ∥ \leq \sum_{j = 2}^{h + 2} (j - 1) ∥ {\hat{Ξ}}_{j, 1} - Ξ_{j, 1} ∥ = O_{P} (\sum_{j = 2}^{h + 2} (j - 1) h^{1 / 2} / N^{1 / 2}) = O_{P} (h^{5 / 2} / N^{1 / 2})

which converges to zero for

h^{5} / T \to 0

. Similarly

{\hat{Ψ}}_{2} - Ψ_{2} = O_{P} (h^{3 / 2} / N^{1 / 2})

.

For

{\hat{Ψ}}_{3}

note that

Θ = Φ_{1} A + Ψ_{3}

. Thus

{\hat{Ψ}}_{3} = \hat{Θ} - {\hat{Φ}}_{1} A

such that

{\hat{Ψ}}_{3} \to Ψ_{3}

from (i) and Lemma A1.

(v) is contained in Lemma A1.

(vi) From (6), and the definition

{\hat{Ω}}_{1 . c} = {({\hat{Φ}}_{1}^{'} {\hat{Σ}}_{ϵ}^{- 1} {\hat{Φ}}_{1})}^{- 1}

, we have

\begin{array}{l} \hat{A} - A & = - {({\hat{Φ}}_{1}^{'} {\hat{Σ}}_{ϵ}^{- 1} {\hat{Φ}}_{1})}^{- 1} {\hat{Φ}}_{1}^{'} {\hat{Σ}}_{ϵ}^{- 1} {\hat{Ψ}}_{3} - A \\ = - {\hat{Ω}}_{1 . c} {\hat{Φ}}_{1}^{'} {\hat{Σ}}_{ϵ}^{- 1} {\hat{Ψ}}_{3} - {\hat{Ω}}_{1 . c} {\hat{Ω}}_{1 . c}^{- 1} A = - {\hat{Ω}}_{1 . c} {\hat{Φ}}_{1}^{'} {\hat{Σ}}_{ϵ}^{- 1} {\hat{Ψ}}_{3} - {\hat{Ω}}_{1 . c} {\hat{Φ}}_{1}^{'} {\hat{Σ}}_{ϵ}^{- 1} {\hat{Φ}}_{1} A \\ = - {\hat{Ω}}_{1 . c} {\hat{Φ}}_{1}^{'} {\hat{Σ}}_{ϵ}^{- 1} ({\hat{Ψ}}_{3} + {\hat{Φ}}_{1} A) = - {\hat{Ω}}_{1 . c} {\hat{Φ}}_{1}^{'} {\hat{Σ}}_{ϵ}^{- 1} \hat{Θ} . \end{array}

Then (i-iii, v) show the result.

Appendix B.2. (B) Asymptotic Distribution of Coefficients to Nonstationary Regressors

(i) The distribution of the coefficients due to the nonstationary components is contained in Lemma A1.

(ii) With respect to the cointegrating relation note that from the proof of Theorem 1 we have

\begin{array}{l} N (\hat{A} - A) & = - N {\hat{Ω}}_{1 . c} {\hat{Φ}}_{1}^{'} {\hat{Σ}}_{ϵ}^{- 1} \hat{Θ} = - Ω_{1 . c} Φ_{1}^{'} Σ_{ϵ}^{- 1} \cdot N \hat{Θ} + o_{P} (1) . \end{array}

Note that

N \hat{Θ} = [N {\hat{Φ}}_{2}, N \hat{Θ}, N^{2} {\hat{Φ}}_{3}] η

, where

η = {[0_{p_{3} \times p_{2}}, I_{p_{3}}, 0_{p_{3} \times p_{3}}]}^{'}

. Then by Lemma A1, we have

N (\hat{A} - A) \Rightarrow - Ω_{1 . c} Φ_{1}^{'} Σ_{ϵ}^{- 1} \cdot g (1) \int_{0}^{1} d B F^{'} {(\int_{0}^{1} F F^{'})}^{- 1} η = - Ω_{1 . c} Φ_{1}^{'} Σ_{ϵ}^{- 1} \cdot g (1) \int_{0}^{1} d B L^{'} {(\int_{0}^{1} L L^{'})}^{- 1} .

Note that

Φ_{1} = g (1) α

, and by definition

Ω = [\begin{matrix} Ω_{11} & Ω_{1 c} \\ Ω_{c 1} & Ω_{c c} \end{matrix}] = g {(1)}^{- 1} Σ_{ϵ} g {(1)}^{' - 1}

, we have

\begin{array}{l} - Ω_{1 . c} Φ_{1}^{'} Σ_{ϵ}^{- 1} g (1) B & = - Ω_{1 . c} α^{'} g {(1)}^{'} Σ_{ϵ}^{- 1} g (1) B = - Ω_{1 . c} [I_{p_{1}} 0] Ω^{- 1} B \\ = Ω_{1 . c} [{(Ω^{- 1})}_{11} {(Ω^{- 1})}_{1 c}] B = Ω_{1 . c} [Ω_{1 . c}^{- 1} - Ω_{1 . c}^{- 1} Ω_{1 c} Ω_{c c}^{- 1}] B \\ = [I_{p_{1}} - Ω_{1 c} Ω_{c c}^{- 1}] [\begin{matrix} B_{1} \\ B_{c} \end{matrix}] = B_{1} - Ω_{1 c} Ω_{c c}^{- 1} B_{c} = B_{1 . c} . \end{array}

Therefore, we have

N (\hat{A} - A) \Rightarrow \int_{0}^{1} d B_{1 . c} L^{'} {(\int_{0}^{1} L L^{'})}^{- 1} .

Appendix B.3. (C) Asymptotic Distribution of Coefficients to Stationary Regressors

Since the regressor vector

U_{t}

is stationary, the asymptotic distribution of

N^{1 / 2} L_{h}^{'} v e c (\hat{Ξ} - \underset{̲}{Ξ})

follows from Lewis and Reinsel (1985) in combination with uniform boundedness of the maximal and the minimal eigenvalue of

Γ_{u} = E U_{t} U_{t}^{'}

, see above. Analogously the result for the coefficients corresponding to the regressor vector

X_{t}

are shown as

X_{t} = T_{h} U_{t}

for nonsingular matrix

T_{h}

.

Appendix B.4. (D) Asymptotic Distribution of Wald Type Tests

For the Wald test in addition to (C) note that the variance

Γ_{E C M}

is replaced by an estimate

{\hat{Γ}}_{E C M}

. For

L_{h}^{'} (Γ_{E C M}^{- 1} \otimes Σ_{ϵ}) L_{h} - L_{h}^{'} ({\hat{Γ}}_{E C M}^{- 1} \otimes {\hat{Σ}}_{ϵ}) L_{h}

note that

{\hat{Σ}}_{ϵ} - Σ_{ϵ} = o_{P} (1)

due to (A) (ii). The regressor vectors

{\tilde{X}}_{t}

and

X_{t}

differ only in the first block where

y_{1, t - 1} = u_{1, t - 1} + A Δ y_{3, t - 1}

replaces

u_{1, t - 1}

. Regressing out

Δ y_{3, t - 1}

eliminates this difference. Then

∥ {\hat{Γ}}_{E C M} - Γ_{E C M} ∥_{1} = O_{P} (h / N^{1 / 2})

according to (Saikkonen and Lütkepohl 1996, p. 835, l. 3). There also invertibility of

Γ_{E C M}

is shown. Using Lemma A.2 of Saikkonen and Lütkepohl (1996) this implies

∥ {\hat{Γ}}_{E C M}^{- 1} - Γ_{E C M}^{- 1} ∥_{1} = O_{P} (h / N^{1 / 2})

.

The rest then follows as the proof of Theorem 4 in Saikkonen and Lütkepohl (1996).

Appendix C. Proof of Theorem 2

Consistency follows directly from Theorem 1 as the general representation can be transformed into a triangular representation using the matrix

B = [β, β_{1}, β_{2}]

, see (4).

With respect to the asymptotic distribution following the proof of Theorem 1 there exists a nonsingular transformation matrix

S_{h}

such that

W_{t} = S_{h} Z_{t, h}

. From

∥ {\hat{R}}^{- 1} - R^{- 1} ∥ = O_{P} (h / N^{1 / 2})

it follows that

{(N^{- 1} ⟨ W_{t}, W_{t} ⟩)}^{- 1} = [\begin{matrix} {(Γ_{u})}^{- 1} & 0 \\ 0 & 0 \end{matrix}] + o_{P} (h / N^{1 / 2}) .

Therefore it follows that the blocks corresponding to the nonstationary regressors do not contribute to the asymptotic distribution. Then standard arguments for the stationary part of the regressor vector can be used.

Appendix D. Proofs for Theorem 3

The proof combines the ideas of Saikkonen and Luukkonen (1997) (in the following S&L) with the asymptotics of 2SI2 of Paruolo (2000) (in the following P). In the proof we will work without restriction of generality with the triangular representation.

The key to the asymptotic properties of the estimators obtained from the 2SI2 algorithm lies in the results of P Lemma A.4 and Lemma A.5 in the appendix. These lemmas deal with the limits of various moment matrices of the form

N^{- a} ⟨ R_{i t}, R_{j t} ⟩

corrected for the stationary components

Δ^{2} y_{t - j}, j = 1, \dots, h - 2

. The correction involves a regressor vector growing in dimension with sample size. This is dealt with in S&L.

In this respect let

S_{t} = {[Δ^{2} y_{t - 1}^{'}, \dots, Δ^{2} y_{t - h + 2}^{'}]}^{'}

which according to (A4) is a linear function of

U_{t}

such that

S_{t} = T_{s} U_{t}

. The definition of

U_{t}

implies

\hat{Q} = N^{- 1} ⟨ U_{t}, U_{t} ⟩ - E U_{t} U_{t}^{'} = O_{P} (h / N^{1 / 2})

. On p. 543 in P the matrices

Σ_{i j}, i, j \in {Y, U, 0}

are defined as limits of second moment matrices. Here

^{'} U^{'}

refers to

β_{1}^{'} Δ y_{t - 1} = u_{2, t - 1}

in the triangular representation,

^{'} Y^{'}

refers to

β^{'} y_{t - 1} + δ β_{2}^{'} Δ y_{t - 1} = y_{1, t - 1} - A Δ y_{3, t - 1} = u_{1, t - 1}

and

^{'} 0^{'}

refers to

Δ^{2} y_{t}

. These are all stationary processes and linear functions of

u_{t}, u_{t - 1}, u_{t - 2}

. Additional to

S_{t}

also

β^{'} Δ y_{t - 1} = Δ u_{1, t - 1} + A u_{3, t - 1}

is corrected for in the second stage.

The arguments on p. 114 and 115 of S&L deal with terms of the form

N^{- 1} ⟨ u_{1, t - 1}, u_{1, t - 1} ⟩ - N^{- 1} ⟨ u_{1, t - 1}, S_{t} ⟩ {⟨ S_{t}, S_{t} ⟩}^{- 1} ⟨ S_{t}, u_{1, t - 1} ⟩ .

Analogous arguments to S&L(A.12) show that this equals (up to terms of order

o_{P} (1)

)

C_{11} = E u_{1, t - 1} u_{1, t - 1}^{'} - E u_{1, t - 1} S_{t}^{'} {(E S_{t} S_{t}^{'})}^{- 1} E S_{t} u_{1, t - 1}^{'} .

S&L state that this is bounded from above and bounded away from zero. The second claim actually is wrong. If

{(u_{1, t})}_{t \in Z}

is univariate white noise with unit variance then

C_{11} = \frac{1}{h}

is achieved by predicting

u_{1, t - 1}

by

\sum_{j = 1}^{h} \frac{h - j}{h} Δ u_{1, t - j} = u_{1, t - 1} - \frac{1}{h} \sum_{j = 1}^{h} u_{1, t - j}

including integration of the regressors in the form of the summation. This does not change the remaining arguments in S&L, it only implies that the separation of the eigenvalues corresponding to the stationary regressors and the ones corresponding to the non-stationary ones is weaker.

In the current case one can show that for

N^{- 1} ⟨ u_{1, t - 1}, u_{1, t - 1} ⟩ - N^{- 1} ⟨ u_{1, t - 1}, S_{t} ⟩ {⟨ S_{t}, S_{t} ⟩}^{- 1} ⟨ S_{t}, u_{1, t - 1} ⟩

where

S_{t}

contains

Δ u_{1, t - 1}

and

Δ^{2} u_{1, t - j}, j = 1, \dots, h

for the corresponding limit

C_{11}

the lower bound

h C_{11} \geq c I

holds for some

0 < c

. The order of the lower bound is achieved by including a double integration of the regressors. For

N^{- 1} ⟨ Δ u_{1, t - 1}, Δ u_{t, t - 1} ⟩ - N^{- 1} ⟨ Δ u_{1, t - 1}, S_{t} ⟩ {⟨ S_{t}, S_{t} ⟩}^{- 1} ⟨ S_{t}, Δ u_{1, t - 1} ⟩ = C_{Δ Δ} + o_{p} (1)

we have

h^{3} C_{Δ Δ} \geq c I

. Here the arguments from above can be applied to the process

{(Δ u_{t})}_{t \in Z}

. For a differenced process the smallest eigenvalue of the matrix

E δ U_{t} δ U_{t}^{'}, δ U_{t}^{'} = [Δ u_{t}^{'}, Δ u_{t - 1}^{'}, \dots, Δ u_{t - h}^{'}]

is of order

h^{- 2}

, compare Theorem 2 of Palma and Bondon (2003).

Since

N^{- 1} ⟨ S_{t}, y_{c, t - 1} ⟩ = O_{P} (h^{1 / 2})

and

N^{- 2} ⟨ S_{t}, y_{3, t - 1} ⟩ = O_{P} (h^{1 / 2})

it follows that

\begin{array}{l} N^{- 1} (⟨ u_{1, t - 1}, y_{c, t - 1} ⟩ - ⟨ u_{1, t - 1}, S_{t} ⟩ {⟨ S_{t}, S_{t} ⟩}^{- 1} ⟨ S_{t}, y_{c, t - 1} ⟩) & = O_{P} (h^{1 / 2}), \\ N^{- 2} (⟨ y_{c, t - 1}, y_{c, t - 1} ⟩ - ⟨ y_{c, t - 1}, S_{t} ⟩ {⟨ S_{t}, S_{t} ⟩}^{- 1} ⟨ S_{t}, y_{c, t - 1} ⟩) & = N^{- 2} ⟨ y_{c, t - 1}, y_{c, t - 1} ⟩ + o_{P} ({(h / N)}^{1 / 2}) \end{array}

as well as

\begin{array}{l} N^{- 2} (⟨ u_{1, t - 1}, y_{3, t - 1} ⟩ - ⟨ u_{1, t - 1}, S_{t} ⟩ {⟨ S_{t}, S_{t} ⟩}^{- 1} ⟨ S_{t}, y_{3, t - 1} ⟩) & = O_{P} (h^{1 / 2}), \\ N^{- 3} (⟨ y_{c, t - 1}, y_{3, t - 1} ⟩ - ⟨ y_{c, t - 1}, S_{t} ⟩ {⟨ S_{t}, S_{t} ⟩}^{- 1} ⟨ S_{t}, y_{3, t - 1} ⟩) & = N^{- 3} ⟨ y_{c, t - 1}, y_{3, t - 1} ⟩ + o_{P} ({(h / N)}^{1 / 2}), \\ N^{- 4} (⟨ y_{3, t - 1}, y_{3, t - 1} ⟩ - ⟨ y_{3, t - 1}, S_{t} ⟩ {⟨ S_{t}, S_{t} ⟩}^{- 1} ⟨ S_{t}, y_{3, t - 1} ⟩) & = N^{- 4} ⟨ y_{3, t - 1}, y_{3, t - 1} ⟩ + o_{P} ({(h / N)}^{1 / 2}) . \end{array}

Therefore the limits of the moment matrices

M_{i j}

are not affected by the correction using stationary terms even if

h \to \infty

except for the terms involving the orders

O_{P} (h^{1 / 2})

. For all stationary terms we find convergence to the corresponding limits denoted

Σ_{i j}

in P.

The first step in the 2SI2 procedure then uses RRR in the equation

Δ^{2} y_{t} = Ψ Δ y_{t - 1} + α β^{'} y_{t - 1} + \underset{̲}{Π} S_{t} + e_{t} .

Then

R_{0 t}

denotes

Δ^{2} y_{t}

corrected for

S_{t}

,

R_{1, t}

denotes

Δ y_{t - 1}

corrected for

S_{t}

and

R_{2, t}

denotes

y_{t - 1}

corrected for

S_{t}

. Lemma A.4 of P derives the limits of different directions of

M_{i j . k}

defined as

M_{i j . k} = M_{i j} - M_{i k} M_{k k}^{- 1} M_{k j}, M_{i j} = N^{- 1} ⟨ R_{i, t}, R_{j, t} ⟩

where

i, j \in {0, 1, 2, ε, β}

. Here

R_{ε, t}

equals

e_{t}

correct for

S_{t}

and

R_{β, t} = β^{'} R_{1, t}

. Further P uses the notation

A_{T} = [{\bar{β}}_{1}, T^{- 1} {\bar{β}}_{2}]

and

{\bar{β}}_{2, T} = {\bar{β}}_{2}

. Here and below we assume without restriction of generality that

[β, β_{1}, β_{2}]

is an orthonormal matrix. Consequently

\bar{β} = β, \bar{β_{1}} = β_{1}, {\bar{β}}_{2} = β_{2}

. Then the results above imply all results of Lemma A.4. of P except that now

A_{T}^{'} M_{20.1} = O_{P} (h^{1 / 2})

.

In particular we obtain the following limits:

\begin{matrix} A_{T}^{'} M_{2 ε . 1} \overset{d}{\to} \int_{0}^{1} F_{†} {(d W)}^{'} & , & A_{T}^{'} M_{22.1} A_{T} \overset{d}{\to} \int_{0}^{1} F_{†} F_{†}^{'}, \\ β_{2}^{'} M_{1 ε . β} \overset{d}{\to} \int_{0}^{1} B_{3} {(d W)}^{'} & , & T^{- 1} β_{2}^{'} M_{11 . β} β_{2} \overset{d}{\to} \int_{0}^{1} B_{3} B_{3}^{'}, \\ β_{2}^{'} M_{1 ε . b} \overset{d}{\to} \int_{0}^{1} L {(d W)}^{'} & , & T^{- 1} β_{2}^{'} M_{11 . b} β_{2} \overset{d}{\to} \int_{0}^{1} L L^{'} . \end{matrix}

Here

W = g (1) B

denotes the Brownian motion corresponding to

{(ε_{t})}_{t \in Z}, F_{†}

denotes the Brownian motion corresponding to

R_{2 t}

(equaling

y_{t - 1}

corrected for

S_{t}

) corrected for

R_{1 t}

(

Δ y_{t - 1}

whose only nonstationary component equals

Δ y_{3, t - 1}

with corresponding Brownian motion

B_{3}

). Thus we obtain the following definitions (where L is as in Theorem 1):

\begin{array}{l} F_{a} (u) & = [\begin{matrix} B_{2} (u) \\ \int_{0}^{u} B_{3} (v) d v \end{matrix}], F_{†} (u) = F_{a} (u) - \int_{0}^{1} F_{a} B_{3}^{'} {(\int_{0}^{1} B_{3} B_{3}^{'})}^{- 1} B_{3} (u), \\ L (u) & = B_{3} (u) - \int_{0}^{1} B_{3} F_{a}^{'} {(\int_{0}^{1} F_{a} F_{a}^{'})}^{- 1} F_{a} (u) . \end{array}

The above arguments show that in the current setting

U_{t - 1} = u_{2, t - 1}

and

Y_{t - 1} = u_{1, t - 1}

are contained in the space spanned by

S_{t}

for

h \to \infty

. Therefore

Σ_{i j} = 0

for

i, j \in {U, Y}

. The subscript ’b’ refers to correcting for

β_{⊥}^{'} R_{2 t}

used in the second stage of 2SI2.

Let

{\tilde{Σ}}_{Y Y}

denote the limit of

h ⟨ Y_{t - 1}, Y_{t - 1} ⟩

and analogously define

{\tilde{Σ}}_{Y U}, {\tilde{Σ}}_{U U}, {\tilde{Σ}}_{0 Y}

and

{\tilde{Σ}}_{0 U}

. For the latter two note that

{\tilde{Σ}}_{0 Y}

denotes the limit of

h ⟨ Δ^{2} y_{t}, Y_{t - 1} ⟩ = h α ⟨ Y_{t - 1}, Y_{t - 1} ⟩ + h ⟨ ζ U_{t - 1}, Y_{t - 1} ⟩ + h ⟨ ζ_{2} β^{'} Δ y_{t - 1}, Y_{t - 1} ⟩ + h \underset{̲}{Π} ⟨ S_{t}, Y_{t - 1} ⟩ + h ⟨ e_{t}, Y_{t - 1} ⟩

corrected for

S_{t}

and

β^{'} Δ y_{t - 1}

. Since

Y_{t - 1}

is stationary the last term is of order

O_{P} ({(h^{3} / N)}^{1 / 2}) = o_{P} (1)

. Therefore it follows that

{\tilde{Σ}}_{0 Y} = α {\tilde{Σ}}_{Y Y} + ζ {\tilde{Σ}}_{U Y}

. Then the results of Lemma A.5 of P hold where in (A.11) and (A.14)

Σ_{i j}

can be replaced by

{\tilde{Σ}}_{i j}

.

The asymptotic analysis below will heavily use the Johansen approach of investigating the solutions to eigenvalue problems in order to maximize the pseudo-likelihood corresponding to the reduced rank regression problem. In order to use the corresponding local analysis one has to first clarify consistency for the various estimators as well as rates of convergence.

The main tool in this respect is Theorem A.1 of Johansen (1997) which establishes in the I(2) setting for the regression

y_{t} = θ^{'} Z_{t} + ε_{t}

(

Z_{t}

being composed of stationary, I(1) and I(2) components) where

D_{T} ⟨ Z_{t}, Z_{t} ⟩ D_{T} = O_{P} (1)

and

D_{T} ⟨ Z_{t}, ε_{t} ⟩ = o_{P} (1)

that

D_{T}^{- 1} (\hat{θ} - θ) = o_{P} (1)

where

\hat{θ}

denotes the pseudo likelihood estimator over some closed parameter set

Θ

.

It is straightforward to see that analogous results hold in the present setting when first concentrating out the stationary components: Consider

y_{t} = θ_{1}^{'} z_{t} + θ_{2}^{'} Z_{t} + e_{t}

. Then

{\hat{θ}}_{2} (θ_{1})

is obtained from the concentration step and the pseudo likelihood involves

⟨ R_{t, y} - θ_{1}^{'} R_{t, z}, R_{t, y} - θ_{1}^{'} R_{t, z} ⟩

where again the processes

R_{t, y}

and

R_{t, z}

denote the processes

y_{t}

and

z_{t}

with the corresponding stationary regressors

Z_{t}

regressed out. These concentrated quantities now can be used in the proof of Theorem A.1 of Johansen (1997) essentially without changes to show consistency for

{\hat{θ}}_{1}

. Consistency of

{\hat{θ}}_{2} ({\hat{θ}}_{1})

then follows from the unrestricted estimation as contained in Theorem 2. As shown above the rates of convergence as well as the limits are unchanged for the coefficients corresponding to the non-stationary components of the regressors for the long VAR case compared to the finite VAR case.

Note that these results hold for general closed parameter space

Θ

, thus including the unrestricted as well as the rank-reduced problem. This shows that we can always reduce the asymptotic analysis of the eigenvalue problems to a neighborhood of the true value as is done in P.

The first step in the proof of Theorem 4.1. of P consists in the investigation of the solutions to the equation (

\tilde{β} = β H + β_{1} H_{1} + β_{2} H_{2}

, letting

B_{T}^{'} = [\begin{matrix} β^{'} \\ T^{- 1 / 2} β_{1}^{'} \\ T^{- 3 / 2} β_{2}^{'} \end{matrix}]

)

B_{T}^{'} M_{22.1} B_{T} [\begin{matrix} H \\ T^{1 / 2} H_{1} \\ T^{3 / 2} H_{2} \end{matrix}] Λ = B_{T}^{'} M_{20.1} M_{00.1}^{- 1} M_{02.1} B_{T} [\begin{matrix} H \\ T^{1 / 2} H_{1} \\ T^{3 / 2} H_{2} \end{matrix}] .

(A6)

Now Lemma A.4 implies that the matrix

B_{T}^{'} M_{22.1} B_{T}

on the left hand side converges to

diag (Σ_{Y Y . U}, \int_{0}^{1} F_{†} F_{†}^{'})

.

B_{T}^{'} M_{20.1} = [\begin{matrix} Σ_{Y 0 . U} \\ 0 \end{matrix}] + O_{P} (h^{1 / 2} T^{- 1 / 2}), M_{00.1} = Σ_{00 . U} + O_{P} (T^{- 1 / 2})

. Multiplying the equation by

h^{2}

we obtain the limiting eigenvalue problem

[\begin{matrix} {\tilde{Σ}}_{Y Y . U} & O_{P} (T^{- 1 / 2} h^{3 / 2}) \\ O_{P} (T^{- 1 / 2} h^{3 / 2}) & h \int_{0}^{1} F_{†} F_{†}^{'} \end{matrix}] [\begin{matrix} H \\ T^{1 / 2} H_{1} \\ T^{3 / 2} H_{2} \end{matrix}] h Λ = [\begin{matrix} {\tilde{Σ}}_{Y 0 . U} Σ_{00 . U}^{- 1} {\tilde{Σ}}_{0 Y . U} & O_{P} (T^{- 1 / 2} h^{5 / 2}) \\ O_{P} (T^{- 1 / 2} h^{5 / 2}) & O_{P} (T^{- 1} h^{3}) \end{matrix}] [\begin{matrix} H \\ T^{1 / 2} H_{1} \\ T^{3 / 2} H_{2} \end{matrix}] .

equation

Therefore asymptotically the first

p - r

eigenvalues of

h Λ

are positive, the remaining ones tending to zero. Likewise the eigenvectors converge at the same speed as the matrices. Thus

H_{1} = O_{P} (h^{5 / 2} / T), H_{2} = O_{P} (h^{5 / 2} / T^{2})

from which

β^{'} M_{22.1} β H Λ H^{- 1} = β^{'} M_{20.1} M_{00.1}^{- 1} M_{02.1} β + O_{P} (h^{4} / T)

and thus using (A.11)

H Λ H^{- 1} = {\tilde{Σ}}_{Y Y . U}^{- 1} {\tilde{Σ}}_{Y 0 . U} Σ_{00 . U}^{- 1} {\tilde{Σ}}_{0 Y . U} / h + O_{P} (h T^{- 1 / 2}) = α^{'} Σ_{00 . U}^{- 1} α {\tilde{Σ}}_{Y Y . U} / h + O_{P} (h T^{- 1 / 2})

follows. Then as in P we have4

M_{22.1} \underset{̲}{\tilde{β}} = M_{20.1} (Σ_{00 . U}^{- 1} {\tilde{Σ}}_{0 Y . U} {(h H Λ H^{- 1})}^{- 1} + O_{P} (h T^{- 1 / 2})) = M_{22.1} β + M_{2 ε . 1} Σ_{ϵ}^{- 1} α {(α^{'} Σ_{ϵ}^{- 1} α)}^{- 1} + a_{1}

where

a_{1} = M_{20.1} O_{P} (h^{2} T^{- 1 / 2}) = o_{P} (1)

and

\underset{̲}{\tilde{β}} = \tilde{β} H^{- 1}

. Then the remaining arguments on p. 546 of P show that the asymptotic distribution of

{(T β_{1}, T^{2} β_{2})}^{'} (\underset{̲}{\tilde{β}} - β)

is identical for the long VAR case as in the finite VAR case.

From these arguments the distribution of the likelihood ratio test of

H_{r}

versus

H_{p}

can be shown: Define

S_{1} (λ) : = λ M_{22.1} - M_{20.1} M_{00.1}^{- 1} M_{02.1}

,

A_{T} : = (β_{1}, T^{- 1} β_{2})

and

{\tilde{B}}_{T} : = (β, A_{T}) = (β, β_{1}, T^{- 1} β_{2})

. Note that

{\tilde{B}}_{T}

is of full rank, (11) is equivalent to

| {\tilde{B}}_{T}^{'} S_{1} (λ) {\tilde{B}}_{T} | = 0

; that is,

|(\begin{matrix} β^{'} \\ β_{1}^{'} \\ T^{- 1} β_{2}^{'} \end{matrix}) S_{1} (λ) (β, β_{1}, T^{- 1} β_{2})| = |β^{'} S_{1} (λ) β| \cdot |A_{T}^{'} (S_{1} (λ) - S_{1} (λ) β {(β^{'} S_{1} (λ) β)}^{- 1} β^{'} S_{1} (λ)) A_{T}| = 0 .

(A7)

Let

δ_{1} = T λ

, so that for every

δ_{1}

we have that

λ \to 0,

as

T \to \infty

. By the above arguments we have that

h^{2} |β^{'} S_{1} (λ) β| = |δ_{1} \frac{h^{2}}{T} β^{'} M_{22.1} β - h^{2} β^{'} M_{20.1} M_{00.1}^{- 1} M_{02.1} β| \overset{p}{\to} |- {\tilde{Σ}}_{Y 0 . U} Σ_{00 . U}^{- 1} {\tilde{Σ}}_{0 Y . U}| \neq 0,

which has no zero root. Moreover, we have

h A_{T}^{'} S_{1} (λ) β = h λ A_{T}^{'} M_{22.1} β - h A_{T}^{'} M_{20.1} M_{00.1}^{- 1} M_{02.1} β = - A_{T}^{'} M_{20.1} Σ_{00 . U}^{- 1} {\tilde{Σ}}_{0 Y . U} + o_{P} (1),

which yields that

\begin{array}{l} |A_{T}^{'} (S_{1} (λ) - S_{1} (λ) β {(β^{'} S_{1} (λ) β)}^{- 1} β^{'} S_{1} (λ)) A_{T}| \\ = & |(δ_{1} \frac{1}{T} A_{T}^{'} M_{22.1} A_{T} - A_{T}^{'} M_{20.1} M_{00.1}^{- 1} M_{02.1} A_{T}) - A_{T}^{'} S_{1} (λ) β {(β^{'} S_{1} (λ) β)}^{- 1} β^{'} S_{1} (λ) A_{T}| \\ = & |(δ_{1} \frac{1}{T} A_{T}^{'} M_{22.1} A_{T}) - A_{T}^{'} M_{20.1} (M_{00.1}^{- 1} - Σ_{00 . U}^{- 1} {\tilde{Σ}}_{0 Y . U} {({\tilde{Σ}}_{Y 0 . U} Σ_{00 . U}^{- 1} {\tilde{Σ}}_{0 Y . U})}^{- 1} {\tilde{Σ}}_{Y 0 . U} Σ_{00 . U}^{- 1} + o_{P} (1)) M_{02.1} A_{T}| \\ = & |(δ_{1} \frac{1}{T} A_{T}^{'} M_{22.1} A_{T}) - A_{T}^{'} M_{20.1} (Σ_{00 . U}^{- 1} - Σ_{00 . U}^{- 1} α {(α Σ_{ϵ}^{- 1} α)}^{- 1} α^{'} Σ_{00 . U}^{- 1} + o_{P} (1)) M_{02.1} A_{T}| \\ \overset{d}{\to} & |δ_{1} \int_{0}^{1} F_{†} F_{†}^{'} - \int_{0}^{1} F_{†} d W^{'} α_{⊥} {(α_{⊥}^{'} Σ_{ϵ} α_{⊥})}^{- 1} α_{⊥}^{'} \int_{0}^{1} d W F_{†}^{'}| = |δ_{1} \int_{0}^{1} F_{†} F_{†}^{'} - (\int_{0}^{1} F_{†} d W_{†}^{'}) (\int_{0}^{1} d W_{†} F_{†}^{'})| \end{array}

where

W_{†} = {(α_{⊥}^{'} Σ_{ε} α_{⊥})}^{- 1 / 2} α_{⊥}^{'} W

. Thus, the smallest

(p - r)

solutions of (11) converge in distribution to the solutions of

|δ_{1} \int_{0}^{1} F_{†} F_{†}^{'} - (\int_{0}^{1} F_{†} d W_{†}^{'}) (\int_{0}^{1} d W_{†} F_{†}^{'})| = 0

, which implies that the test statistic

Q_{r}

has the following limiting distribution,

Q_{r} = \sum_{i = r + 1}^{p} δ_{1, i} + o_{P} (1) \overset{d}{\to} t r ((\int_{0}^{1} d W_{†} F_{†}^{'}) {(\int_{0}^{1} F_{†} F_{†}^{'})}^{- 1} (\int_{0}^{1} F_{†} d W_{†}^{'})) .

For the second stage the arguments are very similar. The eigenvalue problem solved here is the following:

{\bar{\tilde{β}}}_{⊥}^{'} M_{11 . \tilde{β}} {\bar{\tilde{β}}}_{⊥} \tilde{η} Y = {\bar{\tilde{β}}}_{⊥}^{'} M_{1 {\tilde{α}}_{⊥} . \tilde{β}} M_{{\tilde{α}}_{⊥} {\tilde{α}}_{⊥} . \tilde{β}}^{- 1} M_{{\tilde{α}}_{⊥} 1 . \tilde{β}} {\bar{\tilde{β}}}_{⊥} \tilde{η} .

This formula uses

{\tilde{α}}_{⊥}

, the ortho-complement of

\tilde{α} = M_{02.1} \tilde{β} {({\tilde{β}}^{'} M_{22.1} \tilde{β})}^{- 1}

From the above results noting that

h {\tilde{β}}^{'} M_{22.1} \tilde{β} \to {\tilde{Σ}}_{Y Y . U}

and

h M_{02.1} \tilde{β} \to α {\tilde{Σ}}_{Y Y . U}

according to Lemma A.4 we have

\tilde{α} \to α

. Considering the order of convergence we obtain

\tilde{α} - α = O_{P} (h T^{- 1 / 2})

. As in P this implies

{\tilde{α}}_{⊥} - α_{⊥} = O_{P} (h T^{- 1 / 2})

. Using

\tilde{β} - β = O_{P} (h^{5 / 2} / T)

from stage 1 one observes that in the eigenvalue problem estimates can be replaced by true quantities introducing an error of order

o_{P} (h T^{- 1 / 2})

:

{\bar{β}}_{⊥}^{'} M_{11 . β} {\tilde{β}}_{1} Y = {\bar{β}}_{⊥}^{'} M_{1 α_{⊥} . β} M_{α_{⊥} α_{⊥} . β}^{- 1} M_{α_{⊥} 1 . β} {\tilde{β}}_{1} + o_{P} (h T^{- 1 / 2}) .

Then as in P consider

{\tilde{β}}_{1} = β H + β_{1} H_{1} + {\bar{β}}_{2} H_{2}

, reusing the symbols

H, H_{1}, H_{2}

here for

{\tilde{β}}_{1}

in place of

\tilde{β}

as before. Identical arguments as around (A6) show that

H_{1} = O_{P} (1)

and

H_{2} = O_{P} (h^{2} / T)

. Then combining the arguments around (A6) with the developments in P, p. 546 and 547 we obtain (A.21) of P:

{\bar{β}}_{⊥}^{'} M_{11 . β} ({\underset{̲}{\tilde{β}}}_{1} - β_{1}) = {\bar{β}}_{⊥}^{'} M_{1 ε . β} α_{⊥} Σ_{α_{⊥} α_{⊥}}^{- 1} ζ {(ζ^{'} Σ_{α_{⊥} α_{⊥}}^{- 1} ζ)}^{- 1} + o_{P} (1) .

The rest of the proof of (4.3a) and (4.3b) of P follows as in P.

With respect to the second likelihood ratio test consider

{\tilde{S}}_{2} (ρ) = ρ {\bar{\tilde{β}}}_{⊥}^{'} M_{11 . \tilde{β}} {\bar{\tilde{β}}}_{⊥} - {\bar{\tilde{β}}}_{⊥}^{'} M_{1 {\tilde{α}}_{⊥} . \tilde{β}} M_{{\tilde{α}}_{⊥} {\tilde{α}}_{⊥} . \tilde{β}}^{- 1} M_{{\tilde{α}}_{⊥} 1 . \tilde{β}} {\bar{\tilde{β}}}_{⊥} .

The results above imply that

{\tilde{S}}_{2} (ρ)

has uniformly in

| ρ | < C

(for every

0 < C < \infty

) distance to

S_{2} (ρ)

of order

O_{P} (h T^{- 1 / 2})

where

S_{2} (ρ) = ρ {\bar{β}}_{⊥}^{'} M_{11 . β} {\bar{β}}_{⊥} - {\bar{β}}_{⊥}^{'} M_{1 α_{⊥} . β} M_{α_{⊥} α_{⊥} . β}^{- 1} M_{α_{⊥} 1 . β} {\bar{β}}_{⊥} .

Note that since

(η, η_{⊥})

is of full rank, (16) is equivalent to

|(\begin{matrix} η^{'} \\ η_{⊥}^{'} \end{matrix}) S_{2} (ρ) (η, η_{⊥})| = |η^{'} S_{2} (ρ) η| \cdot |η_{⊥}^{'} (S_{2} (ρ) - S_{2} (ρ) η {(η^{'} S_{2} (ρ) η)}^{- 1} η^{'} S_{2} (ρ)) η_{⊥}| = 0 .

(A8)

Let

δ_{2} = T ρ

, so that

ρ \to 0,

as

T \to \infty

. As above it can be seen that

h^{2} |η^{'} S_{2} (\frac{δ_{2}}{T}) η| = h^{2} |\frac{δ_{2}}{T} β_{1}^{'} M_{11 . β} β_{1} - β_{1}^{'} M_{1 {\bar{α}}_{⊥} . β} M_{{\bar{α}}_{⊥} {\bar{α}}_{⊥} . β}^{- 1} M_{{\bar{α}}_{⊥} 1 . β} β_{1}| \overset{p}{\to} |- {\tilde{Σ}}_{U 0} α_{⊥} {(α_{⊥}^{'} Σ_{00} α_{⊥})}^{- 1} α_{⊥}^{'} {\tilde{Σ}}_{0 U}| \neq 0 .

This shows that the s larger roots of

S_{2} (ρ)

tend to zero slower than

O (1 / T)

. Moreover, we have

h η_{⊥}^{'} S_{2} (\frac{δ_{2}}{T}) η = h (\frac{δ_{2}}{T} β_{2}^{'} M_{11 . β} β_{1} - β_{2}^{'} M_{1 {\bar{α}}_{⊥} . β} M_{{\bar{α}}_{⊥} {\bar{α}}_{⊥} . β}^{- 1} M_{{\bar{α}}_{⊥} 1 . β} β_{1}) = - β_{2}^{'} M_{1 {\bar{α}}_{⊥} . β} {(α_{⊥}^{'} Σ_{00} α_{⊥})}^{- 1} α_{⊥}^{'} {\tilde{Σ}}_{0 U} + o_{P} (1),

which yields that (using

P_{M} : = {(α_{⊥}^{'} Σ_{00} α_{⊥})}^{- 1} α_{⊥}^{'} {\tilde{Σ}}_{0 U} {({\tilde{Σ}}_{U 0} α_{⊥} {(α_{⊥}^{'} Σ_{00} α_{⊥})}^{- 1} α_{⊥}^{'} {\tilde{Σ}}_{0 U})}^{- 1} {\tilde{Σ}}_{U 0} α_{⊥} {(α_{⊥}^{'} Σ_{00} α_{⊥})}^{- 1}

)

\begin{array}{l} |η_{⊥}^{'} (S_{2} (\frac{δ_{2}}{T}) - S_{2} (\frac{δ_{2}}{T}) η {(η^{'} S_{2} (\frac{δ_{2}}{T}) η)}^{- 1} η^{'} S_{2} (\frac{δ_{2}}{T})) η_{⊥}| \\ = & |(δ_{2} \frac{1}{T} β_{2}^{'} M_{11 . β} β_{2} - β_{2}^{'} M_{1 {\bar{α}}_{⊥} . β} M_{{\bar{α}}_{⊥} {\bar{α}}_{⊥} . β}^{- 1} M_{{\bar{α}}_{⊥} 1 . β} β_{2}) - h η_{⊥}^{'} S_{2} (\frac{δ_{2}}{T}) η {(h^{2} η^{'} S_{2} (\frac{δ_{2}}{T}) η)}^{- 1} h η^{'} S_{1} (\frac{δ_{2}}{T}) η_{⊥}| \\ = & |(δ_{2} \frac{1}{T} β_{2}^{'} M_{11 . β} β_{2} - β_{2}^{'} M_{1 α_{⊥} . β} {({α_{⊥}}^{'} Σ_{00} α_{⊥})}^{- 1} M_{α_{⊥} 1 . β} β_{2}) + β_{2}^{'} M_{1 α_{⊥} . β} P_{M} M_{α_{⊥} 1 . β} β_{2}| + o_{P} (1) \\ \overset{d}{\to} & |δ_{2} \int_{0}^{1} B_{3} B_{3}^{'} - \int_{0}^{1} B_{3} d W^{'} α_{2} {(α_{2}^{'} Σ_{ϵ} α_{2})}^{- 1} α_{2}^{'} \int_{0}^{1} d W B_{3}^{'}| = |δ_{2} \int_{0}^{1} B_{3} B_{3}^{'} - (\int_{0}^{1} B_{3} d W_{2}^{'}) (\int_{0}^{1} d W_{2} B_{3}^{'})| \end{array}

using the results of Lemma A.5 of P. and (A.18) of Paruolo (1996) as an expression for

{(α_{⊥}^{'} Σ_{00} α_{⊥})}^{- 1} - P_{M}

where

W_{2} = {(α_{2}^{'} Σ_{ϵ} α_{2})}^{- 1 / 2} α_{2}^{'} W

.

Thus, the smallest

(p - r - s)

solutions of (16) converge in distribution to the solutions of

|δ_{2} \int_{0}^{1} B_{3} B_{3}^{'} - (\int_{0}^{1} B_{3} d W_{2}^{'}) (\int_{0}^{1} d W_{2} B_{3}^{'})| = 0,

which shows that the test statistic

Q_{r, s}

has the following limiting distribution,

Q_{r, s} = \sum_{i = s + 1}^{p - r} δ_{2, i} + o_{P} (1) \overset{d}{\to} t r (\int_{0}^{1} d W_{2} B_{3}^{'} {(\int_{0}^{1} B_{3} B_{3}^{'})}^{- 1} \int_{0}^{1} B_{3} d W_{2}^{'}) .

It follows also that the sum

S_{r, s} = Q_{s} + Q_{r, s}

converges in distribution showing (C).

The rest of the proof of relations (4.3a, b) of P follow exactly as in P. In P (4.4) the order of convergence is replaced by

o_{P} (T^{- 1})

, in (4.5) the error term can be shown to be

o_{P} (T^{- 1 / 2})

and in (4.6) instead of the term

O_{P} (T^{- 2})

we achieve

o_{P} (1)

.

These terms show consistency for

\tilde{β}, \tilde{η}

. Using the results of Lemma A.4 of P then consistency for

\tilde{α}, \tilde{ζ}

follow.

Following the proof of Theorem 4.2. on pp. 548+549 of P we can show consistency for

\tilde{ψ}

of P. The only changes refer to the orders of convergence where our setting introduces orders of h into the arguments. Jointly this proves consistency of

\tilde{Ψ}

and

\tilde{Γ}

. Consistency for the coefficients to the stationary terms

Δ^{2} y_{t - j}

follows as usual from the consistency of the estimates for the coefficients to non-stationary regressors. This completes the proof of (D).

With respect to (E) note that the results above show that the asymptotics for the two eigenvalue problems to be solved converge to the same quantities as in the finite VAR case. This shows that the results of P in this respect hold also in the case of long VARs.

Finally for the matrices

Π_{j}

note that Theorem 4.3. of P shows that the asymptotic distribution for all quantities corresponding to stationary regressors are identical for every super-consistent estimator for the coefficients to the non-stationary components.

Appendix E. Proof of Theorem 4

From Theorem 3 it follows that

\hat{Φ} = \hat{α} {\hat{β}}^{'} \to Φ, \hat{Ψ} \to Ψ, {\hat{Π}}_{j} \to Π_{j}, j = 1, 2, \dots, 2 f - 1

. Therefore the Hankel matrix of impulse response coefficients

{\hat{Π}}_{j}

converges to the Hankel matrix corresponding to the

Π_{j}^{'}

s. As

(\bar{A}, B)

is controllable,

(A, B, C)

is minimal and

\bar{A}

is nonsingular according to the assumptions, this Hankel matrix has rank n. This implies that the stochastic realisation algorithm of Appendix F provides consistent estimates

(\hat{\bar{A}}, \hat{B}, \hat{D}) \to (\bar{A}, B, D)

. This implies

\hat{a} (z) = {(1 - z)}^{2} I_{p} - \hat{Φ} z - \hat{Ψ} z (1 - z) - {(1 - z)}^{2} z \hat{D} {(I_{n} - z \hat{\bar{A}})}^{- 1} \hat{B} \to a (z) .

For details see Appendix F.

\hat{a} (z)

does not necessarily correspond to a rational transfer function of order n. It does so, however, if the additional restrictions (22) hold. Step 3 and 4 of the proposed algorithm achieve this. Here step 3 ascertains that solutions to the third equation exist. The second equation explicitly provides a solution

{\bar{α}}_{⊥}

for given

C_{†}

. This solution not necessarily is of full row rank. As in the limit this is the case, it also holds for large enough T. The first equation always admits solutions. Thus for large enough T the set of all solutions is defined by polynomial restrictions. Adding the least squares distance to the estimated impulse response sequence then leads to a quadratic problem under non-linear differentiable constraints, which in the limit has a unique solution. Thus the solution is unique for large enough T.

Consistency of the estimates in combination with continuity of the solution of step 4 implies consistency for the system

(\hat{\bar{A}}, \hat{B}, \hat{C})

. This implies consistency for the inverse system

(\hat{A}, \hat{B}, \hat{C})

in the sense of converging impulse response coefficients and hence consistency for the transfer function estimator in the pointwise topology. The fulfillment of restrictions (22) ensures the structure of the corresponding matrix

\hat{A}

according to state space unit root structure

((0, (c, c + d)))

.

Appendix F. Stochastic Realization Using Overlapping Echelon Forms

This section describes the approximate realization of the first f coefficients

G_{j}, j = 1, \dots, 2 f

of an impulse response sequence using a rational transfer function of order n where

f \geq n

. More details can be found in Section 2.6 of Hannan and Deistler (1988).

Define the Hankel matrix

H_{f, f} = [\begin{matrix} G_{1} & G_{2} & G_{3} & \dots & G_{f} \\ G_{2} & G_{3} & \dots \\ G_{3} & \dots \\ ⋮ & ⋮ \\ G_{f} & G_{f + 1} & \dots & G_{2 f - 1} \end{matrix}] = [\begin{matrix} h (1, 1) \\ h (1, 2) \\ ⋮ \\ h (1, p) \\ h (2, 1) \\ ⋮ \\ h (f, p) \end{matrix}] .

Here

h (i, j)

denotes the j-th row in the i-th block row. Let

α = (n_{1}, \dots, n_{p})

define a nice selection of rows5 of

H

such that

H_{α} \in R^{n \times f p}

, the submatrix of

H

containing the rows

h (i, j), i \leq n_{j}

, is of full row rank. If the impulse response corresponds to a transfer function of order at least n there exists such a nice selection

α

. Finally let

H_{α + 1} \in R^{n \times f p}

denote the matrix

H_{α}

shifted down one block row (that is in each row where

H_{α}

contains

h (i, j), H_{α + 1}

contains

h (i + 1, j)

).

Then it is derived in Hannan and Deistler (1988), Theorem 2.6.2. that if

G_{j}

corresponds to a transfer function

k (z) = \sum_{j = 1}^{\infty} G_{j} z^{- j}

of order exactly n such that the corresponding

H_{α}

is formed using a nice selection, then a system

(A, B, C)

can be defined using the following formulas

A H_{α} = H_{α + 1}, B = H_{α} [\begin{matrix} I_{p} \\ 0 \end{matrix}], C H_{α} = [\begin{matrix} G_{1} & G_{2} & \dots & G_{f} \end{matrix}]

(A9)

such that

G_{j} = C A^{j - 1} B, j = 1, 2, \dots

.

If the order of the transfer function is larger than n, then the equations for A and C can be solved using least squares. If a sequence of impulse responses

{\hat{G}}_{j} \to G_{j}, j = 1, \dots, 2 f - 1,

and the limit

G_{j}

corresponds to a transfer function where the rank of

H_{α}

equals n, it is obvious that the resulting systems

(\hat{A}, \hat{B}, \hat{C}) \to (A, B, C)

since in this case the least squares solution depends continuously on the matrix

H

.

References

Banerjee, Anindya, Lynne Cockerell, and Bill Russell. 2001. An I(2) analysis of inflation and the markup. Journal of Applied Econometrics 16: 221–40. [Google Scholar] [CrossRef]
Bauer, Dietmar, and Alex Maynard. 2012. Persistence-robust surplus-lag Granger causality testing. Journal of Econometrics 169: 293–300. [Google Scholar] [CrossRef]
Bauer, Dietmar, and Martin Wagner. 2004. Autoregressive Approximations to MFI(1) Processes. Working Paper No. 174, Reihe Ökonomie/Economics Series, Vienna, Austria: Institut für Höhere Studien (IHS). [Google Scholar]
Bauer, Dietmar, and Martin Wagner. 2012. A state space canonical form for unit root processes. Econometric Theory 28: 1313–49. [Google Scholar] [CrossRef]
Berk, Kenneth N. 1974. Consistent autoregressive spectral estimates. The Annals of Statistics 2: 489–502. [Google Scholar] [CrossRef]
Boswijk, H. Peter, and Jurgen A. Doornik. 2004. Identifying, estimating and testing restricted cointegrated systems: An overview. Statistica Neerlandica 58: 440–65. [Google Scholar] [CrossRef]
Boswijk, H. Peter, and Paolo Paruolo. 2017. Likelihood ratio tests of restrictions on common trends loading matrices in I(2) VAR systems. Econometrics 5: 28. [Google Scholar] [CrossRef]
Chan, Ngai Hang, and Ching Zong Wei. 1988. Limiting distributions of least squares estimates of unstable autoregressive processes. The Annals of Statistics 16: 367–401. [Google Scholar] [CrossRef]
Dolado, Juan J., and Helmut Lütkepohl. 1996. Making Wald tests work for cointegrated VAR systems. Econometric Reviews 15: 369–86. [Google Scholar] [CrossRef]
Engle, Robert F., and Clive W.J. Granger. 1987. Co-integration and error correction: Representation, estimation, and testing. Econometrica 55: 251–76. [Google Scholar] [CrossRef]
Georgoutsos, Dimitris A., and Georgios P. Kouretas. 2004. A Multivariate I (2) cointegration analysis of German hyperinflation. Applied Financial Economics 14: 29–41. [Google Scholar] [CrossRef]
Hannan, Edward James, and Manfred Deistler. 1988. The Statistical Theory of Linear Systems. New York: John Wiley. [Google Scholar]
Hannan, Edward James, and Laimonis Kavalieris. 1986. Regression, autoregression models. Journal of Time Series Analysis 7: 27–49. [Google Scholar] [CrossRef]
Inoue, Atsushi, and Lutz Kilian. 2020. The uniform validity of impulse response inference in autoregressions. Journal of Econometrics 215: 450–72. [Google Scholar] [CrossRef]
Johansen, Søren, and Helmut Lütkepohl. 2005. A note on testing restrictions for the cointegration parameters of a VAR with I(2) variables. Econometric Theory 21: 653–58. [Google Scholar] [CrossRef]
Johansen, Søren, Katarina Juselius, Roman Frydman, and Michael Goldberg. 2007. Testing hypotheses in an I (2) model with applications to the persistent long swings in the Dmk/$ rate. Journal of Econometrics 158: 1–35. [Google Scholar] [CrossRef]
Johansen, Søren. 1992a. A representation of vector autoregresive processes integrated of order 2. Econometric Theory 8: 188–202. [Google Scholar] [CrossRef]
Johansen, Søren. 1992b. Testing weak exogeneity and the order of cointegration in UK money demand data. Journal of Policy Modeling 14: 313–34. [Google Scholar] [CrossRef]
Johansen, Søren. 1995. Likelihood-Based Inference in Cointegrated Vector Auto-Regressive Models. Oxford: Oxford University Press. [Google Scholar]
Johansen, Søren. 1997. Likelihood analysis of the I(2) model. Scandinavian Journal of Statistics 24: 433–62. [Google Scholar] [CrossRef]
Juselius, Katarina, and Katrin Assenmacher. 2017. Real exchange rate persistence and the excess return puzzle: The case of Switzerland versus the US. Journal of Applied Econometrics 32: 1145–55. [Google Scholar] [CrossRef]
Juselius, Katarina, and Josh R. Stillwagon. 2018. Are outcomes driving expectations or the other way around? An I(2) CVAR analysis of interest rate expectations in the dollar/pound market. Journal of International Money and Finance 83: 93–105. [Google Scholar] [CrossRef]
Juselius, Katarina. 1994. On the duality between long-run relations and common trends in the i(1) versus (2) model. an application to aggregate money holdings. Econometric Reviews 13: 151–78. [Google Scholar] [CrossRef]
Juselius, Katarina. 2006. The Cointegrated VAR Model. Oxford: Oxford University Press. [Google Scholar]
Kurita, Takamitsu, Heino Bohn Nielsen, and Anders Rahbek. 2011. An I(2) cointegration model with piecewise linear trends. Econometrics Journal 14: 131–55. [Google Scholar] [CrossRef]
Kurita, Takamitsu. 2012. Likelihood-based inference for weak exogeneity in I(2) cointegrated VAR models. Econometric Reviews 31: 325–60. [Google Scholar] [CrossRef]
Lewis, Richard, and Gregory C. Reinsel. 1985. Prediction of multivariate time series by autoregressive model fitting. Journal of Multivariate Analysis 16: 393–411. [Google Scholar] [CrossRef]
Lütkepohl, Helmut, and Holger Claessen. 1997. Analysis of cointegrated VARMA processes. Journal of Econometrics 80: 223–29. [Google Scholar] [CrossRef]
Lütkepohl, Helmut, and Pentti Saikkonen. 1997. Impulse response analysis in infinite order cointegrated vector autoregressive processes. Journal of Econometrics 81: 127–57. [Google Scholar] [CrossRef]
Mosconi, Rocco, and Paolo Paruolo. 2013. Identification of Cointegrating Relations in I(2) Vector Autoregressive Models. Rome: PRIN Workshop Forecasting Economic and Financial Time Series, Research Publications at Politecnico di Milano, pp. 1–35. [Google Scholar]
Mosconi, Rocco, and Paolo Paruolo. 2017. Identification conditions in simultaneous systems of cointegrating equations with integrated variables of higher order. Journal of Econometrics 198: 271–76. [Google Scholar] [CrossRef]
Nielsen, Heino Bohn, and Anders Rahbek. 2007. The likelihood ratio test for cointegration ranks in the I(2) model. Econometric Theory 23: 615–37. [Google Scholar] [CrossRef]
Palma, Wilfredo, and Pascal Bondon. 2003. On the eigenstructure of generalized fractional processes. Statistics and Probability Letters 65: 93–101. [Google Scholar] [CrossRef]
Paruolo, Paolo, and Anders Rahbek. 1999. Weak exogeneity in I(2) VAR systems. Journal of Econometrics 93: 281–308. [Google Scholar] [CrossRef]
Paruolo, Paolo. 1994. The role of the drift in I(2) systems. Journal of the Italian Statistical Society 3: 93–123. [Google Scholar] [CrossRef]
Paruolo, Paolo. 1996. On the determination of integration indices in I(2) systems. Journal of Econometrics 72: 313–56. [Google Scholar] [CrossRef]
Paruolo, Paolo. 2000. Asymptotic efficiency of the two stage estimator in I(2) systems. Econometric Theory 16: 524–50. [Google Scholar] [CrossRef]
Paruolo, Paolo. 2006. Common trends and cycles in I(2) VAR systems. Journal of Econometrics 132: 143–68. [Google Scholar] [CrossRef]
Rahbek, Anders, Hans Christian Kongsted, and Clara Jørgensen. 1999. Trend stationarity in the I(2) cointegration model. Journal of Econometrics 90: 265–289. [Google Scholar] [CrossRef]
Saikkonen, Pentti, and Helmut Lütkepohl. 1996. Infinite-order cointegrated vector autoregressive processes: Estimation and inference. Econometric Theory 12: 814–44. [Google Scholar] [CrossRef]
Saikkonen, Pentti, and Ritva Luukkonen. 1997. Testing cointegration in infinite order vector autoregressive processes. Journal of Econometrics 81: 93–126. [Google Scholar] [CrossRef]
Saikkonen, Pentti. 1991. Asymptotically efficient estimation of cointegration regressions. Econometric Theory 7: 1–21. [Google Scholar] [CrossRef]
Saikkonen, Pentti. 1992. Estimation and testing of cointegrated systems by an autoregressive approximation. Econometric Theory 8: 1–27. [Google Scholar] [CrossRef]
Sims, Christopher A., James H. Stock, and Mark W. Watson. 1990. Inference in linear time series models with some unit roots. Econometrica 58: 113–44. [Google Scholar] [CrossRef]
Stillwagon, Josh R. 2018. Are risk premia related to real exchange rate swings? Evidence from I(2) CVARs with survey expectations. Macroeconomic Dynamics 22: 255–78. [Google Scholar] [CrossRef]

1.	Here somewhat sloppily we use the same symbols for processes and their realizations.
2.	Note that $α = {[I_{p_{1}}, 0]}^{'}$ , and thus $Ω_{1 . c} = {({[Ω^{- 1}]}_{11})}^{- 1} = {(α^{'} Ω^{- 1} α)}^{- 1} = {(α^{'} g {(1)}^{'} Σ_{ϵ}^{- 1} g (1) α)}^{- 1} = {(Φ_{1}^{'} Σ_{ϵ}^{- 1} Φ_{1})}^{- 1}$ .
3.	In this appendix processes whose dimension depends on the choice of h are denoted using upper case letters neglecting the dependence on h in the notation otherwise for simplicity.
4.	Contrary to the usual Johansen notation we use $Σ_{ϵ}$ as the noise covariance and $Ω$ as the variance of the Brownian motion corresponding to ${(u_{t})}_{t \in Z}$ . Thus some of the formulas in this part show ’unusual’ form.
5.	A nice selection is such that if $h (i, j)$ is contained in the selection, then also $h (l, j)$ are contained for all $0 < l < i$ .

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Modeling I(2) Processes Using Vector Autoregressions Where the Lag Length Increases with the Sample Size

Abstract

1. Introduction

2. Data Generating Process and Assumptions

3. Unrestricted Estimation

3.1. Estimation in the Triangular VECM Representation

3.2. Estimation in the General VECM Representation

4. Rank Restricted Regression

5. Initial Guess for VARMA Estimation

6. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

Appendix A. Preliminaries

Appendix B. Proof of Theorem 1

Appendix B.1. (A) Consistency

Appendix B.2. (B) Asymptotic Distribution of Coefficients to Nonstationary Regressors

Appendix B.3. (C) Asymptotic Distribution of Coefficients to Stationary Regressors

Appendix B.4. (D) Asymptotic Distribution of Wald Type Tests

Appendix C. Proof of Theorem 2

Appendix D. Proofs for Theorem 3

Appendix E. Proof of Theorem 4

Appendix F. Stochastic Realization Using Overlapping Echelon Forms

References

Article Metrics

Citations

Article Access Statistics