# Cointegration and Error Correction Mechanisms for Singular Stochastic Vectors

^{1}

Università di Bologna, Department of Economics, 40126 Bologna, Italy

^{2}

Einaudi Institute for Economics and Finance, 00187 Roma, Italy

^{3}

Federal Reserve Board of Governors, Washington, DC 20551, USA

^{*}

Author to whom correspondence should be addressed.

Received: 28 March 2018 / Revised: 30 December 2019 / Accepted: 7 January 2020 / Published: 4 February 2020

(This article belongs to the Special Issue Celebrated Econometricians: Katarina Juselius and Søren Johansen)

Large-dimensional dynamic factor models and dynamic stochastic general equilibrium models, both widely used in empirical macroeconomics, deal with singular stochastic vectors, i.e., vectors of dimension r which are driven by a q-dimensional white noise, with $q<r$ . The present paper studies cointegration and error correction representations for an $I\left(1\right)$ singular stochastic vector ${\mathbf{y}}_{t}$ . It is easily seen that ${\mathbf{y}}_{t}$ is necessarily cointegrated with cointegrating rank $c\ge r-q$ . Our contributions are: (i) we generalize Johansen’s proof of the Granger representation theorem to $I\left(1\right)$ singular vectors under the assumption that ${\mathbf{y}}_{t}$ has rational spectral density; (ii) using recent results on singular vectors by Anderson and Deistler, we prove that for generic values of the parameters the autoregressive representation of ${\mathbf{y}}_{t}$ has a finite-degree polynomial. The relationship between the cointegration of the factors and the cointegration of the observable variables in a large-dimensional factor model is also discussed.

## 1. Introduction

An r-dimensional stochastic vector ${\mathbf{y}}_{t}$ such that ${\mathbf{y}}_{t}={\mathbf{A}}_{0}{\mathbf{u}}_{t}+{\mathbf{A}}_{1}{\mathbf{u}}_{t-1}+\cdots ,$ where the matrices ${\mathbf{A}}_{j}$ are $r\times q$ and ${\mathbf{u}}_{t}$ is a q-dimensional white noise, with $q<r$, is said to be singular. Singular stochastic vectors have been systematically analyzed in a number of papers starting with (Anderson and Deistler 2008a, 2008b). A motivation for studying the consequences of singularity, as argued by these authors, is that the factors’ vector in large-dimensional dynamic factor models (DFM), such as those introduced in Forni et al. (2000); Forni and Lippi (2001), (Stock and Watson 2002a, 2002b), is typically singular. Singularity is also an important feature of dynamic stochastic general equilibrium models (DSGE), see e.g., Sargent (1989), Canova (2007), pp. 230–2. Singularity as it arises in DFMs is presented in some detail below.

DFMs are based on the idea that all the observed variables in an economic system are driven by a few common (macroeconomic) shocks and by idiosyncratic components which may result from measurement errors and sectoral or regional shocks. Formally, each variable in the n-dimensional dataset ${x}_{it},\phantom{\rule{4pt}{0ex}}i=1,2,\dots ,n$, $t=1,2,\dots ,T$, is decomposed into the sum of a common component ${\chi}_{it}$, and an idiosyncratic component ${\u03f5}_{it}$: ${x}_{it}={\chi}_{it}+{\u03f5}_{it}$, where ${\chi}_{it}$ and ${\u03f5}_{js}$ are orthogonal for all $i,j,t,s$. In the standard version of the DFM the common components are linear combinations of an r-dimensional vector of common factors ${\mathbf{F}}_{t}={({F}_{1t}\phantom{\rule{4pt}{0ex}}{F}_{2t}\phantom{\rule{4pt}{0ex}}\cdots \phantom{\rule{4pt}{0ex}}{F}_{rt})}^{\prime}$,

$${\chi}_{it}={\lambda}_{i1}{F}_{1t}+{\lambda}_{i2}{F}_{2t}+\cdots +{\lambda}_{ir}{F}_{rt}={\mathit{\lambda}}_{i}{\mathbf{F}}_{t}.$$

Now suppose that the observable variables ${x}_{it}$ and the common factors ${\mathbf{F}}_{t}$ are $I\left(1\right)$ and that
where ${\mathbf{u}}_{t}$ is a nonsingular q-dimensional white-noise vector1, the common shocks. A number of papers analyzing macroeconomic databases find strong empirical support for the assumption that the vector ${\mathbf{F}}_{t}$ is singular, i.e., that $q<r$. See, for US datasets, Giannone et al. (2005); Amengual and Watson (2007); Forni and Gambetti (2010); Luciani (2015). For a Euro-area dataset, see Barigozzi et al. (2014).

$$(1-L){\mathbf{F}}_{t}=\mathbf{C}\left(L\right){\mathbf{u}}_{t},$$

Such results can be easily understood observing that usually the static Equation (1) is just a convenient representation derived from a “primitive” set of dynamic equations linking the common components ${\chi}_{it}$ to the common shocks ${\mathbf{u}}_{t}$. As a simple example, suppose that the variables ${x}_{it}$ are driven by a common one-dimensional cyclical process ${f}_{t}$, such that $(1-\alpha L){f}_{t}={u}_{t}$, where ${u}_{t}$ is scalar white noise, and that the variables ${x}_{it}$ load ${f}_{t}$ dynamically:

$${x}_{it}={a}_{i0}{f}_{t}+{a}_{i1}{f}_{t-1}+{\u03f5}_{it}.$$

In this case we can set ${F}_{1t}={f}_{t}$, ${F}_{2t}={f}_{t-1}={F}_{1,t-1}$, ${\lambda}_{i1}={a}_{i0}$, ${\lambda}_{i2}={a}_{i1}$, so that Equations (1) and (2) take the form
respectively. Here $r=2$ and $q=1$ so that ${\mathbf{F}}_{t}$ is singular. For a general analysis of the relationship between representation (1) and “deeper” dynamic representations like (3), see e.g., Forni et al. (2009); Stock and Watson (2016).

$${x}_{it}={\lambda}_{i1}{F}_{1t}+{\lambda}_{i2}{F}_{2t}+{\u03f5}_{it}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\mathrm{and}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\left(\begin{array}{c}{F}_{1t}\\ {F}_{2t}\end{array}\right)=\left(\begin{array}{c}{(1-\alpha L)}^{-1}\\ L{(1-\alpha L)}^{-1}\end{array}\right){u}_{t},$$

Now suppose that the factors ${\mathbf{F}}_{t}$ have been estimated. Obtaining ${\mathbf{u}}_{t}$ and the impulse-response functions of the variables ${x}_{it}$ with respect to ${\mathbf{u}}_{t}$ (or structural shocks obtained by a linear transformation of ${\mathbf{u}}_{t}$) requires the estimation of a VAR for the singular $I\left(1\right)$ vector ${\mathbf{F}}_{t}$. On the other hand, the latter is necessarily cointegrated with cointegration rank c at least equal to $r-q$ (the rank of the spectral density of $(1-L){\mathbf{F}}_{t}$ does not exceed q at all frequencies and, therefore, at frequency zero).

Singular vectors of factors in an $I\left(1\right)$ DFM and $I\left(1\right)$ singular vectors in DSGE models provide strong motivation for studying singular $I\left(1\right)$ vectors in a general time-series context. The main contributions of the paper are:

- (I)
- A generalization of Johansen’s proof of the Granger Representation Theorem (from MA to AR), this is Proposition 2. Consider an $I\left(1\right)$ singular vector ${\mathbf{y}}_{t}$, with dimension r, rank $q<r$, and cointegrating rank $c\ge r-q$. Assuming that $(1-L){\mathbf{y}}_{t}$ has an ARMA structure, $\mathbf{S}\left(L\right)(1-L){\mathbf{y}}_{t}=\mathbf{B}\left(L\right){\mathbf{u}}_{t}$ and that some simple additional conditions hold, ${\mathbf{y}}_{t}$ has a representation as a vector error correction mechanism (VECM) with c error correction terms:$$\mathbf{A}\left(L\right){\mathbf{y}}_{t}={\mathbf{A}}^{*}\left(L\right)(1-L){\mathbf{y}}_{t}+\mathit{\alpha}({\mathit{\beta}}^{\prime}{\mathbf{y}}_{t-1}-\mathbf{w})=\mathbf{B}\left(0\right){\mathbf{u}}_{t},$$
- (II)
- Assuming that the parameters of $\mathbf{S}\left(L\right)$ and $\mathbf{B}\left(L\right)$ may vary in an open subset of ${\mathbb{R}}^{\lambda}$, see Section 3.2 for the definition of $\lambda $, in Proposition 3 we show that all the assumptions used to obtain (4), and also the assumption that unity is the only possible zero of $\mathbf{B}\left(L\right)$, hold for generic values of the parameters. This implies that the matrices $\mathbf{A}\left(L\right)$ and ${\mathbf{A}}^{*}\left(L\right)$ are generically of finite degree, which is obviously not the case for nonsingular vectors.2

The paper is organized as follows. Section 2 is preliminary. We firstly recall recent results for stationary singular stochastic vectors with rational spectral density, see (Anderson and Deistler 2008a, 2008b). Secondly, we discuss cointegration and the cointegrating rank for $I\left(1\right)$ singular stochastic vectors.

In Section 3 we prove our main results. We also obtain the permanent-transitory shock representation in the singular case: ${\mathbf{y}}_{t}$ is driven by $r-c$ permanent shocks, i.e., r minus the cointegrating rank, the usual result. However, the number of transitory shocks is $c-(r-q)$, not c as in the nonsingular case.

Section 3 also contains an exercise carried out with simulated singular $I\left(1\right)$ vectors. We compare the results obtained by estimating an unrestricted VAR in the levels and a VECM. Though limited to a simple example, the results confirm what has been found for nonsingular vectors, that under cointegration the long-run features of impulse-response functions are better estimated using a VECM rather than an unrestricted VAR in the levels (Phillips 1998).

In Section 4 we analyse cointegration of the observable variables ${x}_{it}$ in a DFM. Our results on cointegration of the singular vector ${\mathbf{F}}_{t}$ have the implication that p-dimensional subvectors of the n-dimensional common-component vector ${\mathit{\chi}}_{t}$, with $p>r-c$, are cointegrated. As a consequence, stationarity of the idiosyncratic components would imply that all p-dimensional subvectors of the n-dimensional dataset ${\mathbf{x}}_{t}$ are cointegrated if $p>r-c$. For example, if $q=3$ and $d=1$, then all 3-dimensional subvectors in the dataset are cointegrated, a kind of regularity that we do not observe in actual large macroeconomic datasets. This suggests that an estimation strategy robust to the assumption that the idiosyncratic components can be $I\left(1\right)$ has to be preferred (Barigozzi et al. 2019, for this aspect we refer to). Section 5 concludes. Some proofs, a discussion of some non-uniqueness problems arising with singularity and details on the simulations are collected in the Appendix.

## 2. Stationary and $\mathbf{I}\left(\mathbf{1}\right)$ Singular Vectors

#### 2.1. Stationary Singular Vectors

As in this paper we only consider representation issues it is convenient to assume that all stochastic processes are defined for $t\in \mathbb{Z}$. Accordingly, the lag operator L is defined as $L{\mathbf{y}}_{t}={\mathbf{y}}_{t-1}$ for $t\in \mathbb{Z}$ (Bauer and Wagner (2012) also study $I\left(1\right)$ and cointegrated processes for $t\in \mathbb{Z}$).

We start by introducing results on singular vectors with an ARMA structure from (Anderson and Deistler 2008a, 2008b). Some preliminary definitions are needed.

**Definition**

**1.**

**(Zeros and Poles)**

(A) When considering matrices $\mathbf{V}\left(z\right)$ whose entries are rational functions of $z\in \mathbb{C}$ we always assume that numerator and denominator of each entry have no common roots. If $\mathbf{V}\left(z\right)$ is an $r\times q$ matrix of rational functions, we say that ${z}^{*}$ is a pole of $\mathbf{V}\left(z\right)$ if it is a pole of some entry of $\mathbf{V}\left(z\right)$.

(B) Suppose that $\mathbf{V}\left(z\right)$ is an $r\times q$ matrix whose entries are polynomial functions of $z\in \mathbb{C}$, with $q\le r$. We say that ${z}^{*}\in \mathbb{C}$ is a zero of $\mathbf{V}\left(z\right)$ if rank$\left(\mathbf{V}\left({z}^{*}\right)\right)<q$, and that $\mathbf{V}\left(z\right)$ is zeroless if it has no zeros, i.e., rank$\left(\mathbf{V}\right(z\left)\right)=q$ for all $z\in \mathbb{C}$.

With a minor abuse of language, we may speak of zeros and poles of the corresponding matrix $\mathbf{V}\left(L\right)$. When a $r\times r$ polynomial matrix $\mathbf{S}\left(L\right)$ has all its zeros outside the unit circle we say that $\mathbf{S}\left(L\right)$ is stable.

All the stationary vector processes considered have an ARMA structure. Precisely, the r-dimensional process ${\mathbf{y}}_{t}$ has an ARMA structure with rank q, $q\le r$, if there exist

- (i)
- a non-singular q-dimensional white-noise process ${\mathbf{u}}_{t}$,
- (ii)
- an $r\times r$ stable polynomial matrix $\mathbf{S}\left(z\right)$, with $\mathbf{S}\left(0\right)={\mathbf{I}}_{r}$,
- (iii)
- an $r\times q$ matrix $\mathbf{B}\left(z\right)$ whose rank is q for all z with the exception of a finite subset of $\mathbb{C}$, such that$${\mathbf{y}}_{t}=\mathbf{V}\left(L\right){\mathbf{u}}_{t},$$

Suppose that ${\mathbf{y}}_{t}$ has also the representation ${\mathbf{y}}_{t}=\tilde{\mathbf{S}}{\left(L\right)}^{-1}\tilde{\mathbf{B}}\left(L\right){\tilde{\mathbf{u}}}_{t}$, where ${\tilde{\mathbf{u}}}_{t}$ is a $\tilde{q}$-dimensional nonsingular white noise. Denoting by ${\mathbf{\Sigma}}_{y}\left(\theta \right)$ the spectral density of ${\mathbf{y}}_{t}$,
so that the rank of ${\mathbf{\Sigma}}_{y}\left(\theta \right)$ is q for all $\theta $, with the exception of a finite subset of $[-\pi ,\phantom{\rule{4pt}{0ex}}\pi ]$. As the spectral density is independent of the ARMA representation, $q=\tilde{q}$ and $\tilde{\mathbf{B}}\left(z\right)$ has rank q except for a finite subset of $\mathbb{C}$.

$${\mathbf{\Sigma}}_{y}\left(\theta \right)={\left(2\pi \right)}^{-1}\mathbf{V}\left({e}^{-i\theta}\right){\mathbf{\Sigma}}_{u}{\mathbf{V}}^{\prime}\left({e}^{i\theta}\right),$$

**Remark**

**1.**

Let us recall that the equation

$$\mathbf{S}\left(L\right){\mathit{\zeta}}_{t}=\mathbf{B}\left(L\right){\mathbf{u}}_{t},$$

in the unknown vector process ${\mathit{\zeta}}_{t}$, where $\mathbf{S}\left(L\right)$ is stable, has only one stationary solution, and this is ${\mathbf{y}}_{t}=\mathbf{S}{\left(L\right)}^{-1}\mathbf{B}\left(L\right){\mathbf{u}}_{t}.$ Thus the ARMA process ${\mathbf{y}}_{t}$ can also be defined as the stationary solution of $\mathbf{S}\left(L\right){\mathit{\zeta}}_{t}=\mathbf{B}\left(L\right){\mathbf{u}}_{t}$.

**Definition**

**2.**

**(Genericity)**Suppose that a statement Q depends on $\mathbf{p}\in \mathcal{A}$, where $\mathcal{A}$ is an open subset of ${\mathbb{R}}^{\lambda}$. We say that Q holds generically in $\mathcal{A}$, or that Q holds for generic values of $\mathbf{p}\in \mathcal{A}$, if the subset $\mathcal{N}$ of $\mathcal{A}$ where it does not hold is nowhere dense in $\mathcal{A}$, i.e., the closure of $\mathcal{N}$ in $\mathcal{A}$ has no internal points.

For example, assuming that $\mathbf{p}\in \mathcal{A}=\mathbb{R}$, the statement “The roots of the polynomial ${x}^{2}+\mathbf{p}x+1$ are distinct” holds generically in $\mathcal{A}$.

**Definition**

**3.**

**(Rational reduced-rank family of filters)**Assume that $r>q$ and let $\mathcal{G}$ be a set of ordered couples $\left(\mathbf{S}\right(L),\mathbf{B}(L\left)\right)$, where:

- (i)
- $\mathbf{B}\left(L\right)$ is an $r\times q$ polynomial matrix of degree ${s}_{1}\ge 0$.
- (ii)
- $\mathbf{S}\left(L\right)$ is an $r\times r$ polynomial matrix of degree ${s}_{2}\ge 0$. $\mathbf{S}\left(0\right)={\mathbf{I}}_{r}$.
- (iii)
- Denoting by $\mathbf{p}$ the vector containing the $\lambda =rq({s}_{1}+1)+{r}^{2}{s}_{2}$ coefficients of the entries of $\mathbf{B}\left(L\right)$ and $\mathbf{S}\left(L\right)$, we assume that $\mathbf{p}\in \Pi $, where Π is an open subset of ${\mathbb{R}}^{\lambda}$ such that for $\mathbf{p}\in \Pi $,(1)$\mathbf{S}\left(z\right)$ is stable,(2)$\mathrm{rank}\left(\mathbf{B}\right(z\left)\right)=q$ with the exception of a finite subset of $\mathbb{C}$.

We say that $\mathcal{G}$ is a rational reduced-rank family of filters with parameter set Π.

The notation ${\mathbf{S}}^{\mathbf{p}}\left(L\right)$, ${\mathbf{B}}^{\mathbf{p}}\left(L\right)$, though more rigorous, would be heavy and not really necessary. We use it only in Appendix A.1.

**Proposition**

**1.**

Assume that $r>q$.

- (I)
- Suppose that $\mathbf{V}\left(L\right)$ is an $r\times q$ matrix polynomial in L. If $\mathbf{V}\left(z\right)$ is zeroless then $\mathbf{V}\left(L\right)$ has an $r\times r$ finite-degree stable left inverse, i.e., there exists a finite-degree polynomial $r\times r$ matrix $\mathbf{W}\left(L\right)$ such that:(a)$\mathbf{W}\left(0\right)={\mathbf{I}}_{r}$,(b)$det\left(\mathbf{W}\right(z\left)\right)=0$ implies $\left|z\right|>1$,(c)$\mathbf{W}\left(L\right)\mathbf{V}\left(L\right)=\mathbf{V}\left(0\right)$. Let ${\mathbf{y}}_{t}$ be the stationary solution of $\mathbf{S}\left(L\right){\mathit{\zeta}}_{t}=\mathbf{B}\left(L\right){\mathbf{u}}_{t}$ and suppose that $\mathbf{B}\left(L\right)$ is zeroless. Then ${\mathbf{y}}_{t}$ has a finite vector autoregressive representation (VAR) $\mathbf{A}\left(L\right){\mathbf{y}}_{t}=\mathbf{B}\left(0\right){\mathbf{u}}_{t}$, where $\mathbf{A}\left(L\right)=\mathbf{N}\left(L\right)\mathbf{S}\left(L\right)$ and $\mathbf{N}\left(L\right)$ is a finite-degree left inverse of $\mathbf{B}\left(L\right)$.
- (II)
- Assume that ${\mathbf{y}}_{t}$ is the stationary solution of $\mathbf{S}\left(L\right){\mathit{\zeta}}_{t}=\mathbf{B}\left(L\right){\mathbf{u}}_{t}$, where $\left(\mathbf{S}\right(L),\mathbf{B}(L\left)\right)$ belongs to a rational reduced-rank family of filters with parameter set Π. For generic values of the parameters in Π, $\mathbf{B}\left(L\right)$ is zeroless so that ${\mathbf{y}}_{t}$ has a finite VAR representation.

For statement (I) see Anderson and Deistler (2008a), Theorem 3. Statement (II) is a modified version of their Theorem 2, see for a proof Forni et al. (2009), p. 1327.

#### 2.2. Fundamentalness

Assume that the r-dimensional vector ${\mathbf{y}}_{t}$ has an ARMA structure, rank q and the moving average representation (5). If $\mathrm{rank}\left(\mathbf{B}\right(z\left)\right)=q$ for $\left|z\right|<1$, then ${\mathbf{u}}_{t}$ belongs to the space spanned by ${\mathbf{y}}_{t-k}$, with $k\ge 0$, and representation (5), as well as ${\mathbf{u}}_{t}$, is called fundamental (for these definitions and results see e.g., Rozanov (1967), pp. 43–7). Note that if (5) is fundamental $\mathrm{rank}\left(\mathbf{B}\right(0\left)\right)=q.$ Note also that when $q=r$, the condition that $\mathrm{rank}\left(\mathbf{B}\right(z\left)\right)=q$ for $\left|z\right|<1$ becomes $det\left(\mathbf{B}\right(z\left)\right)\ne 0$ for $\left|z\right|<1$.

**Remark**

**2.**

Note that in Proposition 1, part (II), we do not assume that ${\mathbf{u}}_{t}$ is fundamental for ${\mathbf{y}}_{t}$. However, Proposition 1, (II), states that for generic values of $\mathbf{p}\in \Pi $ the matrix $\mathbf{B}\left(L\right)$ is zeroless and therefore ${\mathbf{u}}_{t}$ is fundamental for ${\mathbf{y}}_{t}$.

#### 2.3. $I\left(1\right)$ Singular Vectors

To analyze cointegration and the autoregressive representations of singular non-stationary vectors let us first recall the definitions of $I\left(0\right)$, $I\left(1\right)$ and cointegrated vectors. This requires some preliminary definitions and results.

We denote by ${L}_{2}(\Omega ,\mathcal{F},P)$ the space of the square-integrable functions on the probability space $(\Omega ,\mathcal{F},P)$. Let ${\mathbf{z}}_{t}={({z}_{1t}\phantom{\rule{4pt}{0ex}}{z}_{2t}\phantom{\rule{4pt}{0ex}}\cdots \phantom{\rule{4pt}{0ex}}{z}_{rt})}^{\prime}$, ${z}_{ht}\in {L}_{2}(\Omega ,\mathcal{F},P)$, be an r-dimensional stochastic process and consider the difference equation
in the unknown r-dimensional process ${\zeta}_{t}$. A solution of (6) is
see e.g., Gregoir (1999), p. 439, Franchi andParuolo (2019). All the solutions of (6) are ${\mathit{\psi}}_{t}={\tilde{\mathit{\psi}}}_{t}+{\mathit{\varphi}}_{t}$, where ${\mathit{\varphi}}_{t}={({\varphi}_{1t}\phantom{\rule{4pt}{0ex}}{\varphi}_{2t}\phantom{\rule{4pt}{0ex}}\cdots \phantom{\rule{4pt}{0ex}}{\varphi}_{rt})}^{\prime}$, ${\varphi}_{ht}\in {L}_{2}(\Omega ,\mathcal{F},P)$, is a solution of the homogeneous equation $(1-L){\mathit{\zeta}}_{t}=\mathbf{0}$, so that ${\mathit{\varphi}}_{t}=\mathbf{K}$, for some r-dimensional stochastic vector $\mathbf{K}$, for all $t\in \mathbb{Z}$. We say that the process ${\mathit{\varphi}}_{t}=\mathbf{K}$ is a constant stochastic process. Obviously a constant stochastic process ${\mathit{\varphi}}_{t}=\mathbf{K}$ is weakly stationary. Its spectral measure has the jump ${\mathbf{\Sigma}}_{K}$ at frequency zero. Thus ${\mathit{\varphi}}_{t}$ has a spectral density (has an absolutely continuous spectral measure) if and only if ${\mathbf{\Sigma}}_{K}=\mathbf{0}$, i.e., if and only if ${\mathit{\varphi}}_{t}\left(\omega \right)=\mathbf{k}$, where $\mathbf{k}\in {\mathbb{R}}^{r}$, for $\omega $ almost everywhere in $\Omega $.

$$(1-L){\mathit{\zeta}}_{t}={\mathbf{z}}_{t},$$

$${\tilde{\mathit{\psi}}}_{t}=\left\{\begin{array}{c}{\mathbf{z}}_{1}+{\mathbf{z}}_{2}+\cdots +{\mathbf{z}}_{t},\phantom{\rule{4.pt}{0ex}}\mathrm{for}\phantom{\rule{4.pt}{0ex}}t>0\hfill \\ \mathbf{0},\phantom{\rule{4.pt}{0ex}}\mathrm{for}\phantom{\rule{4.pt}{0ex}}t=0\hfill \\ -({\mathbf{z}}_{0}+{\mathbf{z}}_{-1}\cdots +{\mathbf{z}}_{t+1}),\phantom{\rule{4.pt}{0ex}}\mathrm{for}\phantom{\rule{4.pt}{0ex}}t<0,\hfill \end{array}\right.$$

**Definition**

**4.**

**(I(0), I(1) and Cointegrated vectors)**

**I(0).**An r-dimensional ARMA ${\mathbf{y}}_{t}$ with spectral density ${\mathbf{\Sigma}}_{y}\left(\theta \right)$ is $I\left(0\right)$ if ${\mathbf{\Sigma}}_{y}\left(0\right)\ne \mathbf{0}$.

**I(1).**The r-dimensional vector stochastic process ${\mathbf{y}}_{t}$ is $I\left(1\right)$ if it is a solution $(1-L){\mathit{\zeta}}_{t}={\mathbf{z}}_{t}$ where ${\mathbf{z}}_{t}$ is an r-dimensional $I\left(0\right)$ process. The rank of ${\mathbf{y}}_{t}$ is defined as the rank of ${\mathbf{z}}_{t}$.

**Cointegration.**

Assume that the r-dimensional stochastic vector ${\mathbf{y}}_{t}$ is $I\left(1\right)$ and denote by ${\mathbf{\Sigma}}_{\Delta y}\left(\theta \right)$ the spectral density of $(1-L){\mathbf{y}}_{t}$. The vector ${\mathbf{y}}_{t}$ is cointegrated with cointegrating rank c, with $0<c<r$, if rank$\left({\mathbf{\Sigma}}_{\Delta y}\left(0\right)\right)=r-c$.

If q is the rank of ${\mathbf{y}}_{t}$ and $r\ge q$, then $c=r-q+d$, where $q>d>0$. Thus in the singular case, $r>q$, ${\mathbf{y}}_{t}$ is necessarily cointegrated with cointegrating rank at least equal to $r-q$.

If ${\mathbf{y}}_{t}$ is $I\left(1\right)$ and cointegrated with cointegrating rank c, there exist c linearly independent $r\times 1$ vectors ${\mathbf{c}}_{j}$, $j=1,\dots ,c$, such that the spectral density of ${\mathbf{c}}_{j}^{\prime}(1-L){\mathbf{y}}_{t}$ vanishes at frequency zero. The vectors ${\mathbf{c}}_{j}$ are called cointegrating vectors and the set ${\mathbf{c}}_{j}$, $j=1,\dots ,c$, a complete set of cointegrating vectors. Of course a complete set of cointegrating vectors ${\mathbf{c}}_{j}$, $j=1,\dots ,c$, can be replaced by the set ${\mathbf{d}}_{j}$, $j=1,\dots ,c$, where the vectors ${\mathbf{d}}_{j}$ are c independent linear combinations of the vectors ${\mathbf{c}}_{j}$.

**Lemma**

**1.**

(I) Assume that ${\mathbf{y}}_{t}$ has an ARMA structure and has the rational representation (5): ${\mathbf{y}}_{t}=\mathbf{V}\left(L\right){\mathbf{u}}_{t}$. Then ${\mathbf{y}}_{t}$ is $I\left(0\right)$ if and only if $\mathbf{V}\left(1\right)\ne \mathbf{0}$.

(II) Assume $(1-L){\mathbf{y}}_{t}$ has an ARMA structure and has the rational representation

$$(1-L){\mathbf{y}}_{t}=\mathbf{V}\left(L\right){\mathbf{u}}_{t}.$$

The process ${\mathbf{y}}_{t}$ is $I\left(1\right)$ if and only if $\mathbf{V}\left(1\right)\ne \mathbf{0}$.

(III) If ${\mathbf{y}}_{t}$ is $I\left(1\right)$, cointegrated and has representation (7), the cointegrating rank of ${\mathbf{y}}_{t}$ is c if and only if the rank of $\mathbf{V}\left(1\right)$ is $r-c$. Moreover $\mathbf{c}$ is a cointegrating vector for ${\mathbf{y}}_{t}$ if and only if ${\mathbf{c}}^{\prime}\mathbf{V}\left(1\right)=\mathbf{0}$.

(IV) Assume that ${\mathbf{y}}_{t}$ is $I\left(1\right)$. $\mathbf{c}$ is a cointegrating vector for ${\mathbf{y}}_{t}$ if and only if a scalar stochastic variable $w\in {L}_{2}(\Omega ,\mathcal{F},P)$ can be determined such that ${\mathbf{c}}^{\prime}{\mathbf{y}}_{t}-w$ is stationary with an ARMA structure.

**Proof.**

(I) is an immediate consequence of ${\mathbf{\Sigma}}_{y}\left(0\right)={\left(2\pi \right)}^{-1}\mathbf{V}\left(1\right){\mathbf{\Gamma}}_{u}\mathbf{V}{\left(1\right)}^{\prime}$, where ${\mathbf{\Gamma}}_{u}$ is the nonsingular covariance matrix of ${\mathbf{u}}_{t}$. (II) and (III) are obtained in the same way from ${\mathbf{\Sigma}}_{\Delta y}\left(0\right)={\left(2\pi \right)}^{-1}\mathbf{V}\left(1\right){\mathbf{\Gamma}}_{u}\mathbf{V}{\left(1\right)}^{\prime}$.

(IV) The process ${\mathbf{y}}_{t}$ solves (6) with ${\mathbf{z}}_{t}=\mathbf{V}\left(L\right){\mathbf{u}}_{t}$, so that, defining
we have
where (i) the entries of ${\mathbf{V}}^{*}\left(L\right)=(\mathbf{V}\left(L\right)-\mathbf{V}\left(1\right))/(1-L)$ are rational functions of L with no poles of modulus less or equal to unity, (ii) $\mathbf{K}$ is a constant r-dimensional stochastic process. We have:

$${\mathit{\mu}}_{t}=\left\{\begin{array}{c}{\mathbf{u}}_{1}+{\mathbf{u}}_{2}+\cdots +{\mathbf{u}}_{t},\phantom{\rule{4.pt}{0ex}}\mathrm{for}\phantom{\rule{4.pt}{0ex}}t>0\hfill \\ \mathbf{0},\phantom{\rule{4.pt}{0ex}}\mathrm{for}\phantom{\rule{4.pt}{0ex}}t=0\hfill \\ -({\mathbf{u}}_{0}+{\mathbf{u}}_{-1}\cdots +{\mathbf{u}}_{t+1}),\phantom{\rule{4.pt}{0ex}}\mathrm{for}\phantom{\rule{4.pt}{0ex}}t<0,\hfill \end{array}\right.$$

$${\mathbf{y}}_{t}=\mathbf{V}\left(L\right){\mathit{\mu}}_{t}+\mathbf{K}=\left[\mathbf{V}\left(1\right)+(1-L)\frac{\mathbf{V}\left(L\right)-\mathbf{V}\left(1\right)}{1-L}\right]{\mathit{\mu}}_{t}+\mathbf{K}=\mathbf{V}\left(1\right){\mathit{\mu}}_{t}+{\mathbf{V}}^{*}\left(L\right){\mathbf{u}}_{t}+\mathbf{K},$$

$${\mathbf{c}}^{\prime}{\mathbf{y}}_{t}={\mathbf{c}}^{\prime}\mathbf{V}\left(1\right){\mathit{\mu}}_{t}+{\mathbf{c}}^{\prime}{\mathbf{V}}^{*}\left(L\right){\mathbf{u}}_{t}+{\mathbf{c}}^{\prime}\mathbf{K}.$$

If $\mathbf{c}$ is a cointegrating vector of ${\mathbf{y}}_{t}$ we have ${\mathbf{c}}^{\prime}\mathbf{V}\left(1\right)=\mathbf{0}$, so that

$${\mathbf{c}}^{\prime}{\mathbf{y}}_{t}={\mathbf{c}}^{\prime}{\mathbf{V}}^{*}\left(L\right){\mathbf{u}}_{t}+{\mathbf{c}}^{\prime}\mathbf{K}.$$

Setting $w={\mathbf{c}}^{\prime}\mathbf{K},$ the process ${\mathbf{c}}^{\prime}{\mathbf{y}}_{t}-w={\mathbf{c}}^{\prime}{\mathbf{V}}^{*}\left(L\right){\mathbf{u}}_{t}$ has the desired properties. Note that w has the equivalent definition $w={\mathbf{c}}^{\prime}{\mathbf{y}}_{0}-{\mathbf{c}}^{\prime}{\mathbf{V}}^{*}\left(L\right){\mathbf{u}}_{0}$. Conversely, suppose that w is such that ${\mathbf{c}}^{\prime}{\mathbf{y}}_{t}-w$ has an ARMA structure. By (9),
so that

$${\mathbf{c}}^{\prime}{\mathbf{y}}_{t}-w={\mathbf{c}}^{\prime}\mathbf{V}\left(1\right){\mathit{\mu}}_{t}+{\mathbf{c}}^{\prime}{\mathbf{V}}^{*}\left(L\right){\mathbf{u}}_{t}+{\mathbf{c}}^{\prime}\mathbf{K}-w,$$

$$\sqrt{{\mathrm{E}}^{\phantom{{}^{\prime}}}{({\mathbf{c}}^{\prime}{\mathbf{y}}_{t}-w)}^{2}}+\sqrt{{\mathrm{E}}^{\phantom{{}^{\prime}}}{\left({\mathbf{c}}^{\prime}{\mathbf{V}}^{*}\left(L\right){\mathbf{u}}_{t}\right)}^{2}}+\sqrt{\mathrm{E}{({\mathbf{c}}^{\prime}{\mathbf{K}}^{\phantom{{}^{\prime}}}-w)}^{2}}\ge \sqrt{{\mathbf{c}}^{\prime}\mathbf{V}\left(1\right){\mathbf{\Sigma}}_{{\mu}_{\phantom{T}t}}{\mathbf{V}}^{\prime}\left(1\right)\mathbf{c}}.$$

The three terms on the left-hand side are finite and independent of t. As ${\mathbf{\Sigma}}_{{\mu}_{t}}=\left|t\right|{\mathbf{\Sigma}}_{u}$ and ${\mathbf{\Sigma}}_{u}$ is positive definite, the right-hand side diverges for $\left|t\right|\to \infty $ unless ${\mathbf{c}}^{\prime}\mathbf{V}\left(1\right)=\mathbf{0}$. □

Lemma 1 shows that our definitions of $I\left(0\right)$ and $I\left(1\right)$ processes are equivalent to Definitions 3.2, and 3.3 in Johansen (1995), p. 35, with two minor differences: (i) our assumption of rational spectral density, (ii) the time span of the stochastic processes is $t=0,1,\dots $ in Johansen’s book, $t\in \mathbb{Z}$ in the present paper. Also, under the assumption that $(1-L){\mathbf{y}}_{t}$ has an ARMA structure, our definition of cointegration is equivalent to that in Johansen (1995), p. 37.

## 3. Representation Theory for Singular $\mathbf{I}\left(\mathbf{1}\right)$ Vectors

In Section 3.1 we prove our generalization to singular vectors of the Granger representation theorem (from MA to AR). We closely follow the proof in Johansen (1995), Theorem 4.5, p. 55–57. In Section 3.2 we show that, under a suitable parameterization, the matrix of the autoregressive representation is generically of finite degree.

#### 3.1. The Granger Representation Theorem (MA to AR)

Suppose that $r\ge q$, $c>0$ and $r>c\ge r-q.$ Let $\mathbf{B}\left(L\right)$ be an $r\times q$ polynomial matrix of degree ${s}_{1}\ge 0$ and $\mathbf{S}\left(L\right)$ an $r\times r$ polynomial matrix of degree ${s}_{2}\ge 0$ with $\mathbf{S}\left(0\right)={\mathbf{I}}_{r}$.

**Assumption**

**1.**

$\mathbf{S}\left(L\right)$ is stable.

**Assumption**

**2.**

If ${z}^{*}$ is a zero of $\mathbf{B}\left(z\right)$ (i.e. $\mathrm{rank}\left(\mathbf{B}\left({z}^{*}\right)\right)<q$) then either ${z}^{*}=1$ or $|{z}^{*}|>1$.

Assumption 2 implies that the rank of $\mathbf{B}\left(0\right)$ is q. The next is a stronger version of Assumption 2:

**Assumption**

**3.**

If ${z}^{*}$ is a zero of $\mathbf{B}\left(z\right)$ then ${z}^{*}=1$.

**Assumption**

**4.**

$\mathrm{rank}\left(\mathbf{B}\right(1\left)\right)=r-c$.

Under Assumption 1, let ${\mathbf{y}}_{t}$ be a solution of the equation

$$(1-L){\mathit{\zeta}}_{t}=\mathbf{S}{\left(L\right)}^{-1}\mathbf{B}\left(L\right){\mathbf{u}}_{t}.$$

We have
where ${\mathit{\mu}}_{t}$ is defined in (8) and $\mathbf{K}$ is a constant stochastic process. By Assumption 4, $\mathbf{S}{\left(1\right)}^{-1}\mathbf{B}\left(1\right)\ne \mathbf{0}$, so that ${\mathbf{y}}_{t}$ is $I\left(1\right)$ with cointegrating rank c, see Lemma 1, (II) and (III).

$${\mathbf{y}}_{t}=\mathbf{S}{\left(L\right)}^{-1}\mathbf{B}\left(L\right){\mathit{\mu}}_{t}+\mathbf{K},$$

Consider the finite Taylor expansion of $\mathbf{B}\left(z\right)$ around $z=1$:

$$\mathbf{B}\left(z\right)=\mathbf{B}\left(1\right)-(1-z){\mathbf{B}}^{\prime}\left(1\right)+{(1-z)}^{2}{\mathbf{B}}^{\prime \prime}\left(1\right)+\cdots .$$

Assumption 4 implies that
where $\xi $ is $r\times (r-c)$ of rank $r-c$, $\mathit{\eta}$ is $q\times (r-c)$ of rank $r-c$, see Lancaster and Tismenetsky (1985, p. 97, Proposition 3). The Taylor expansion above can be rewritten as
where ${\mathbf{B}}^{*}=-{\mathbf{B}}^{\prime}\left(1\right)$ and $\mathbf{E}\left(z\right)$ is a polynomial matrix.

$$\mathbf{B}\left(1\right)=\mathit{\xi}{\mathit{\eta}}^{\prime},$$

$$\mathbf{B}\left(z\right)=\mathit{\xi}{\mathit{\eta}}^{\prime}+(1-z){\mathbf{B}}^{*}+{(1-z)}^{2}\mathbf{E}\left(z\right),$$

Let ${\xi}_{\perp}$ be an $r\times c$ matrix whose columns are orthogonal to all columns of $\mathit{\xi}$: (i) the columns of ${\mathit{\xi}}_{\perp}$ are a complete set of cointegrating vectors for $\mathbf{B}\left(L\right){\mathbf{u}}_{t}$, (ii) the columns of the matrix ${\mathbf{S}}^{\prime}\left(1\right){\mathit{\xi}}_{\perp}$ are a complete set of cointegrating vectors for ${\mathbf{y}}_{t}$. Regarding (i), using (11) and (12), we have
so that ${\mathit{\xi}}_{\perp}^{\prime}\mathbf{S}\left(L\right){\mathbf{y}}_{t}-{\mathit{\xi}}_{\perp}^{\prime}\mathbf{S}\left(1\right)\mathbf{K}$ has an ARMA structure. Regarding (ii), see the proof of Proposition 2.

$${\mathit{\xi}}_{\perp}^{\prime}\mathbf{S}\left(L\right){\mathbf{y}}_{t}={\mathit{\xi}}_{\perp}^{\prime}\mathbf{B}\left(L\right){\mathit{\mu}}_{t}+{\mathit{\xi}}_{\perp}^{\prime}\mathbf{S}\left(1\right)\mathbf{K}=({\mathit{\xi}}_{\perp}^{\prime}{\mathbf{B}}^{*}+(1-L){\mathit{\xi}}_{\perp}^{\prime}\mathbf{E}\left(L\right)){\mathbf{u}}_{t}+{\mathit{\xi}}_{\perp}^{\prime}\mathbf{S}\left(1\right)\mathbf{K},$$

**Assumption**

**5.**

$\mathrm{rank}\left[\left(\begin{array}{c}{\xi}_{\perp}^{\prime}{\mathbf{B}}^{*}\\ {\eta}^{\prime}\end{array}\right)\right]=\mathrm{rank}\left[\left(\begin{array}{c}{\xi}_{\perp}^{\prime}{\mathbf{B}}^{*}\\ {\xi}^{\prime}\xi {\eta}^{\prime}\end{array}\right)\right]=q.$

Define ${\mathbf{S}}^{*}\left(L\right)=\frac{{\displaystyle \mathbf{S}\left(L\right)-\mathbf{S}\left(1\right)}}{{\displaystyle 1-L}}$.

**Assumption**

**6.**

${\xi}_{\perp}^{\prime}({\mathbf{B}}^{*}-{\mathbf{S}}^{*}\left(1\right)\mathbf{S}{\left(1\right)}^{-1}\mathit{\xi}{\mathit{\eta}}^{\prime})\ne \mathbf{0}$.

**Remark**

**3.**

Let ${\mathbf{y}}_{t}$ be a solution of (10) so that $(1-L){\mathbf{y}}_{t}$ is stationary and $\mathbf{S}\left(L\right)\left[(1-L){\mathbf{y}}_{t}\right]=\mathbf{B}\left(L\right){\mathbf{u}}_{t}$. Assumption 2, and therefore 3, implies that ${\mathbf{u}}_{t}$ is fundamental for $(1-L){\mathbf{y}}_{t}$, see Section 2.2.

We are now ready for our main representation result.

**Proposition**

**2.**

(I) Weak form. Suppose that Assumptions 1, 2, 4, 5 and 6 hold and let ${\mathbf{y}}_{t}$ be a solution of the difference Equation (10), so that ${\mathbf{y}}_{t}=\mathbf{S}{\left(L\right)}^{-1}\mathbf{B}\left(L\right){\mathit{\mu}}_{t}+\mathbf{K}$, with ${\mathit{\mu}}_{t}$ defined in (8) and $\mathbf{K}$ a constant stochastic process. Set $\mathit{\beta}=\mathbf{S}{\left(1\right)}^{\prime}{\mathit{\xi}}_{\perp}$. Then a c-dimensional stochastic vector $\mathbf{w}$ can be determined such that (i) ${\mathit{\beta}}^{\prime}{\mathbf{y}}_{t}-\mathbf{w}$ is $I\left(0\right)$, (ii) ${\mathbf{y}}_{t}$ has the error correction representation
where $\mathbf{A}\left(L\right)$ is a rational $r\times r$ matrix with no poles in or on the unit circle, $\mathbf{A}\left(1\right)={\mathbf{I}}_{r}$, ${\mathbf{A}}^{*}\left(L\right)=(\mathbf{A}\left(L\right)-\mathbf{A}\left(1\right)L){(1-L)}^{-1},$ $\mathit{\alpha}$ is $r\times c$ and full rank, $\mathit{\alpha}{\mathit{\beta}}^{\prime}=\mathbf{A}\left(1\right)$.

$$\mathbf{A}\left(L\right){\mathbf{y}}_{t}={\mathbf{A}}^{*}\left(L\right)(1-L){\mathbf{y}}_{t}+\mathit{\alpha}({\mathit{\beta}}^{\prime}{\mathbf{y}}_{t-1}-\mathbf{w})=\mathbf{B}\left(0\right){\mathbf{u}}_{t},$$

(II) Strong form. Under Assumptions 1, 3, 4, 5 and 6, statement (I) holds with an $r\times r$ stable, finite-degree matrix polynomial $\mathbf{A}\left(L\right)$.

**Proof.**

Multiply both sides of $(1-L)\mathbf{S}\left(L\right){\mathbf{y}}_{t}=\mathbf{B}\left(L\right){\mathbf{u}}_{t}$ by the $r\times r$ invertible matrix $\Xi =\left(\begin{array}{c}{\mathit{\xi}}_{\perp}^{\prime}\\ {\mathit{\xi}}^{\prime}\end{array}\right)$. We obtain

$$\begin{array}{c}(1-L)\Xi \mathbf{S}\left(L\right){\mathbf{y}}_{t}=\Xi \mathbf{B}\left(L\right){\mathbf{u}}_{t}\hfill \\ =\left\{\left(\begin{array}{c}{\mathbf{0}}_{c\times q}\\ {\mathit{\xi}}^{\prime}\mathit{\xi}{\mathit{\eta}}^{\prime}\end{array}\right)+(1-L)\left(\begin{array}{c}{\mathit{\xi}}_{\perp}^{\prime}{\mathbf{B}}^{*}\\ {\mathit{\xi}}^{\prime}{\mathbf{B}}^{*}\end{array}\right)+{(1-L)}^{2}\left(\begin{array}{c}{\mathit{\xi}}_{\perp}^{\prime}\mathbf{E}\left(L\right)\\ {\mathit{\xi}}^{\prime}\mathbf{E}\left(L\right)\end{array}\right)\right\}{\mathbf{u}}_{t}\hfill \\ =\left(\begin{array}{cc}(1-L){\mathbf{I}}_{c}& \mathbf{0}\\ \mathbf{0}& {\mathbf{I}}_{r-c}\end{array}\right)\left\{\left(\begin{array}{c}{\mathit{\xi}}_{\perp}^{\prime}{\mathbf{B}}^{*}\\ {\mathit{\xi}}^{\prime}\mathit{\xi}{\mathit{\eta}}^{\prime}\end{array}\right)+(1-L)\left(\begin{array}{c}{\mathit{\xi}}_{\perp}^{\prime}\mathbf{E}\left(L\right)\\ {\mathit{\xi}}^{\prime}{\mathbf{B}}^{*}\end{array}\right)+{(1-L)}^{2}\left(\begin{array}{c}{\mathbf{0}}_{c\times q}\\ {\mathit{\xi}}^{\prime}\mathbf{E}\left(L\right)\end{array}\right)\right\}{\mathbf{u}}_{t}.\hfill \end{array}$$

Taking the first c rows in (15),

$$(1-L){\mathit{\xi}}_{\perp}^{\prime}\mathbf{S}\left(L\right){\mathbf{y}}_{t}=(1-L)\left({\mathit{\xi}}_{\perp}^{\prime}{\mathbf{B}}^{*}+(1-L){\mathit{\xi}}_{\perp}^{\prime}\mathbf{E}\left(L\right)\right){\mathbf{u}}_{t}.$$

This implies that
where $\mathbf{w}$ is a c-dimensional constant stochastic vector. Comparing with (13), $\mathbf{w}={\mathit{\xi}}_{\perp}^{\prime}\mathbf{S}\left(1\right)\mathbf{K}$. On the other hand,
where the last equality has been obtained using (16) and $\mathcal{H}\left(L\right)$ is a suitable polynomial matrix. Thus ${\mathit{\beta}}^{\prime}{\mathbf{y}}_{t}-\mathbf{w}={\mathit{\xi}}_{\perp}^{\prime}\mathbf{S}\left(1\right){\mathbf{y}}_{t}-\mathbf{w}$ has an ARMA structure. Moreover, by Assumption 6, ${\mathit{\beta}}^{\prime}{\mathbf{y}}_{t}-\mathbf{w}$ is $I\left(0\right)$.

$${\mathit{\xi}}_{\perp}^{\prime}\mathbf{S}\left(L\right){\mathbf{y}}_{t}=\left({\mathit{\xi}}_{\perp}^{\prime}{\mathbf{B}}^{*}+(1-L){\mathit{\xi}}_{\perp}^{\prime}\mathbf{E}\left(L\right)\right){\mathbf{u}}_{t}+\mathbf{w},$$

$$\begin{array}{cc}\hfill {\mathit{\xi}}_{\perp}^{\prime}\mathbf{S}\left(1\right){\mathbf{y}}_{t}-\mathbf{w}& =({\mathit{\xi}}_{\perp}^{\prime}\mathbf{S}\left(L\right){\mathbf{y}}_{t}-\mathbf{w})-({\mathit{\xi}}_{\perp}^{\prime}\mathbf{S}\left(L\right){\mathbf{y}}_{t}-{\mathit{\xi}}_{\perp}^{\prime}\mathbf{S}\left(1\right){\mathbf{y}}_{t})\hfill \\ \hfill \phantom{\rule{1.em}{0ex}}& =({\mathit{\xi}}_{\perp}^{\prime}\mathbf{S}\left(L\right){\mathbf{y}}_{t}-\mathbf{w})-{\mathit{\xi}}_{\perp}^{\prime}{\mathbf{S}}^{*}\left(L\right)(1-L){\mathbf{y}}_{t}\hfill \\ \hfill \phantom{\rule{1.em}{0ex}}& =({\mathit{\xi}}_{\perp}^{\prime}\mathbf{S}\left(L\right){\mathbf{y}}_{t}-\mathbf{w})-{\mathit{\xi}}_{\perp}^{\prime}{\mathbf{S}}^{*}\left(L\right)\mathbf{S}{\left(L\right)}^{-1}\mathbf{B}\left(L\right){\mathbf{u}}_{t}\hfill \\ \hfill \phantom{\rule{1.em}{0ex}}& =\left\{{\mathit{\xi}}_{\perp}^{\prime}({\mathbf{B}}^{*}-{\mathbf{S}}^{*}\left(1\right)\mathbf{S}{\left(1\right)}^{-1}\mathit{\xi}{\mathit{\eta}}^{\prime})+(1-L)\mathcal{H}\left(L\right)\right\}{\mathbf{u}}_{t},\hfill \end{array}$$

Joining (16) with the last $r-c$ rows of (15),
where

$$\left(\begin{array}{cc}{\mathbf{I}}_{c}& \mathbf{0}\\ \mathbf{0}& (1-L){\mathbf{I}}_{r-c}\end{array}\right)\Xi \mathbf{S}\left(L\right){\mathbf{y}}_{t}-\left(\begin{array}{c}{\mathbf{I}}_{c}\\ {\mathbf{0}}_{(r-c)\times c}\end{array}\right)\mathbf{w}=\mathbf{M}\left(L\right){\mathbf{u}}_{t},$$

$$\mathbf{M}\left(L\right)=\left(\begin{array}{c}{\mathit{\xi}}_{\perp}^{\prime}{\mathbf{B}}^{*}\\ {\mathit{\xi}}^{\prime}\mathit{\xi}{\mathit{\eta}}^{\prime}\end{array}\right)+(1-L)\left(\begin{array}{c}{\mathit{\xi}}_{\perp}^{\prime}\mathbf{E}\left(L\right)\\ {\mathit{\xi}}^{\prime}{\mathbf{B}}^{*}\end{array}\right)+{(1-L)}^{2}\left(\begin{array}{c}{\mathbf{0}}_{c\times q}\\ {\mathit{\xi}}^{\prime}\mathbf{E}\left(L\right)\end{array}\right).$$

By (15) and (19),

$$\mathbf{B}\left(L\right)={\Xi}^{-1}\left(\begin{array}{cc}(1-L){\mathbf{I}}_{c}& \mathbf{0}\\ \mathbf{0}& {\mathbf{I}}_{r-c}\end{array}\right)\mathbf{M}\left(L\right){.}^{3}$$

By Assumption 5, $\mathbf{M}\left(z\right)$ has no zero at $z=1$, see (19). On the other hand, (i) if ${z}^{*}$ is a zero of $\mathbf{M}\left(z\right)$ then ${z}^{*}$ is a zero of $\mathbf{B}\left(z\right)$, (ii) if ${z}^{*}$ is a zero of $\mathbf{B}\left(z\right)$, ${z}^{*}\ne 1$, then ${z}^{*}$ is a zero of $\mathbf{M}\left(z\right)$. Therefore, Assumption 3 implies that $\mathbf{M}\left(z\right)$ is zeroless and viceversa. Under Assumption 2, the zeros of $\mathbf{M}\left(z\right)$ lie outside the unit circle. In order to conclude the proof we need inverting $\mathbf{M}\left(L\right)$ in (18).

(I) Under Assumption 3, Proposition 1, part (I), states that there exists an $r\times r$ stable, finite-degree polynomial matrix $\mathbf{N}\left(L\right)={\mathbf{I}}_{r}+{\mathbf{N}}_{1}L+\cdots +{\mathbf{N}}_{p}{L}^{p},$ for some p, such that: (i) $\mathbf{N}\left(0\right)={\mathbf{I}}_{r}$, (ii) $\mathbf{N}\left(L\right)\mathbf{M}\left(L\right)=\mathbf{M}\left(0\right)$.

(II) Under Assumption 2, by a standard procedure we remove all the zeros of $\mathbf{M}\left(z\right)$ which lie outside the unit circle4, then use Proposition 1, part (I), to left-invert the residual zeroless polynomial, thus obtaining an $r\times r$ rational matrix $\mathbf{N}\left(L\right)$ such that (i) $\mathbf{N}\left(L\right)$ has no poles in or on the unit circle (possible poles of $\mathbf{N}\left(L\right)$ are the zeros of $\mathbf{M}\left(L\right)$, which lie outside the unit circle), (ii) $\mathbf{N}\left(0\right)={\mathbf{I}}_{r}$, (iii) $\mathbf{N}\left(L\right)\mathbf{M}\left(L\right)=\mathbf{M}\left(0\right)$. See also Deistler et al. (2010).

Defining
and using $\mathbf{M}\left(0\right)=\Xi \mathbf{B}\left(0\right)$, we have
with $\mathbf{A}\left(0\right)={\mathbf{I}}_{r}.$ Defining ${\mathbf{A}}^{*}\left(L\right)=(\mathbf{A}\left(L\right)-\mathbf{A}\left(1\right)L){(1-L)}^{-1},$

$$\mathbf{A}\left(L\right)={\Xi}^{-1}\mathbf{N}\left(L\right)\left(\begin{array}{cc}{\mathbf{I}}_{c}& \mathbf{0}\\ \mathbf{0}& (1-L){\mathbf{I}}_{r-c}\end{array}\right)\Xi \mathbf{S}\left(L\right)={\Xi}^{-1}\mathbf{N}\left(L\right)\left(\begin{array}{c}{\mathit{\xi}}_{\perp}^{\prime}\\ (1-L){\mathit{\xi}}^{\prime}\end{array}\right)\mathbf{S}\left(L\right)$$

$$\mathbf{A}\left(L\right){\mathbf{y}}_{t}-{\Xi}^{-1}\mathbf{N}\left(1\right)\left(\begin{array}{c}{\mathbf{I}}_{c}\\ {\mathbf{0}}_{(r-c)\times c}\end{array}\right)\mathbf{w}=\mathbf{B}\left(0\right){\mathbf{u}}_{t},$$

$${\mathbf{A}}^{*}\left(L\right)(1-L){\mathbf{y}}_{t}+\mathbf{A}\left(1\right){\mathbf{y}}_{t-1}-{\Xi}^{-1}\mathbf{N}\left(1\right)\left(\begin{array}{c}{\mathbf{I}}_{c}\\ {\mathbf{0}}_{(r-c)\times c}\end{array}\right)\mathbf{w}=\mathbf{B}\left(0\right){\mathbf{u}}_{t}.$$

Defining
we see that $\mathbf{A}\left(1\right)=\alpha {\mathit{\beta}}^{\prime}$ and
□

$$\mathit{\alpha}={\Xi}^{-1}\mathbf{N}\left(1\right)\left(\begin{array}{c}{\mathbf{I}}_{c}\\ {\mathbf{0}}_{(r-c)\times c}\end{array}\right),$$

$${\mathbf{A}}^{*}\left(L\right)(1-L){\mathbf{y}}_{t}+\mathit{\alpha}({\mathit{\beta}}^{\prime}{\mathbf{y}}_{t-1}-\mathbf{w})=\mathbf{B}\left(0\right){\mathbf{u}}_{t}.$$

Some remarks are in order.

**Remark**

**4.**

(I) Under our assumption of an ARMA structure, Assumption 1 corresponds to Definition 3.1 in Johansen’s book, see p. 34. Assumption 2 is Johansen’s Assumption 1 (see p. 14), adapted for singularity. Assumption 3 has no counterpart in Johansen’s nonsingular framework. In Section 3.2 we show that under the parameterization adopted in Definition 5, Assumption 3 holds generically.

(II) Simplifying the model by taking $\mathbf{S}\left(L\right)={\mathbf{I}}_{r}$, Assumption 5 generalizes to the singular case Johansen’s assumption that ${\mathit{\xi}}_{\perp}^{\prime}{\mathbf{C}}^{*}{\eta}_{\perp}$ is full rank (see Theorem 4.5, p. 55; ${\mathbf{C}}^{*}$ corresponds to our ${\mathbf{B}}^{*}$). For, assuming that $r=q$, multiplying the matrix in Assumption 5 by the nonsingular matrix $\left(\begin{array}{cc}{\mathit{\eta}}_{\perp}& \eta \end{array}\right)$, we obtain that Assumption 5 holds if and only if ${\mathit{\xi}}_{\perp}^{\prime}{\mathbf{B}}^{*}{\mathit{\eta}}_{\perp}$ is full rank. Assumption 5 is used in the proof of Proposition 2 to invert the matrix $\mathbf{M}\left(L\right)$, which remains on the right-hand side after the removal of the unit roots, see Equation (18), which is the same rôle played by Johansen’s assumption in his proof.

(III) Under $\mathbf{S}\left(L\right)={\mathbf{I}}_{r}$, assumption 6 simplifies to ${\mathit{\xi}}_{\perp}^{\prime}{\mathbf{B}}^{*}\ne \mathbf{0}$. If $d>0$ Assumption 6 is a consequence of Assumption 5. For, if $d>0$ then $r-c=q-d<q$. On the other hand, $r-c$ is the number of rows of ${\eta}^{\prime}$, so that Assumption 5 holds only if Assumption 6 holds. In particular, if $r=q$ and $c=d>0$, Assumption 6 is redundant. However if $r>q$ and $d=0$, so that the rank of ${\eta}^{\prime}$ is q, then Assumption 5 holds even if ${\xi}_{\perp}^{\prime}{\mathbf{B}}^{*}=\mathbf{0}$. Assumption 6 is necessary in Proposition 2 to prove that the error correction term is $I\left(0\right)$, not only stationary.

**Remark**

**5.**

Uniqueness issues arise with autoregressive representations of singular vectors. For example, suppose that $c=r-q>0$, so that $d=0$. Representation (14) has an $(r-q)$-dimensional error correction term ${\mathit{\beta}}^{\prime}{\mathbf{y}}_{t}-\mathbf{w}$. On the other hand, in this case $\mathbf{B}\left(1\right)$ has full rank q, so that Proposition 1(I)applies and, in spite of cointegration, ${\mathbf{y}}_{t}$ has an autoregressive representation in differences

$$\mathbf{D}\left(L\right)\mathbf{S}\left(L\right)(1-L){\mathbf{y}}_{t}=\mathbf{B}\left(0\right){\mathbf{u}}_{t}.$$

In Appendix B.1 we sketch a proof of the statement that in general, ${\mathbf{y}}_{t}$ has VECM representations with a number of error correction terms ranging from d to c. However, as we show in Appendix B.2, different autoregressive representations of ${\mathbf{y}}_{t}$ produce the same impulse-response functions. Both in this and the companion paper Barigozzi et al. (2019) the number of error correction terms in the error correction representation for reduced-rank I(1) vectors is always the maximum c. It is worth reporting that, in our experiments with simulated data, the best results in estimation of singular VECMs are obtained using c as the number of error correction terms.

**Remark**

**6.**

Assume for simplicity that $\mathbf{S}\left(L\right)={\mathbf{I}}_{r}$. From Equation (17):

$${\mathbf{e}}_{t}={\mathit{\beta}}^{\prime}{\mathbf{y}}_{t}-\mathbf{w}={\mathit{\xi}}_{\perp}^{\prime}{\mathbf{y}}_{t}-\mathbf{w}=\left\{{\mathit{\xi}}_{\perp}^{\prime}{\mathbf{B}}^{*}+(1-L)\mathcal{H}\left(L\right)\right\}{\mathbf{u}}_{t}.$$

If $r=q$, Assumption 5 implies that ${\mathit{\xi}}_{\perp}^{\prime}{\mathbf{B}}^{*}$ has rank c, so that no c-dimensional vector $\mathbf{d}\ne \mathbf{0}$ can be determined such that some of the coordinates of $\mathbf{d}{\mathbf{e}}_{t}$ is stationary but not $I\left(0\right)$. Thus, according to the definition introduced in Franchi andParuolo (2019), p. 1181, the error term ${\mathbf{e}}_{t}$ is a “non-cointegrated $I\left(0\right)$ process.” When $r>q$ and $c\le q$, i.e., $r\le 2q-d$, elementary examples can be produced in which ${\mathbf{e}}_{t}$ is an $I\left(0\right)$ but not a non-cointegrated $I\left(0\right)$ process (one is given in Appendix A.2). Thus Assumption 6 only implies that ${\mathbf{e}}_{t}$ is $I\left(0\right)$. Of course, under $c\le q$, the assumption that ${\xi}_{\perp}^{\prime}({\mathbf{B}}^{*}-{\mathbf{S}}^{*}\left(1\right)\mathbf{S}{\left(1\right)}^{-1}\mathit{\xi}{\mathit{\eta}}^{\prime})$ has rank c, an enhancement of Assumption 6, implies that ${\mathbf{e}}_{t}$ is a non-cointegrated $I\left(0\right)$ process. On the other hand, if $c>q$, i.e., $r>2q-d$, ${\mathbf{e}}_{t}$ cannot be a non-cointegrated $I\left(0\right)$ process.

#### 3.2. Generically, $\mathbf{A}\left(L\right)$ Is a Finite-Degree Polynomial

Suppose that the couple $\left(\mathbf{S}\right(L),\mathbf{B}(L\left)\right)$ is parameterized as in Definition 3. It easy to see that $\mathbf{B}\left(1\right)$ has generically rank q, so that generically the cointegrating rank of ${\mathbf{y}}_{t}$ is $r-q$. In particular, if $r=q$ cointegration is non generic.

It is quite easy to see that this paradoxical result only depends on the choice of a parameter set that is unfit to study cointegration. Our starting point here is that a specific value of c between $r-q$ and $r-1$ has a motivation in economic theory or in statistical inference, and must be therefore built in the parameter set. Thus in Definition 5 below the family of filters is redefined so that generically the cointegrating rank is equal to a given c between $r-q$ and $r-1$.

**Definition**

**5.**

**(Rational reduced-rank family of filters with cointegrating rank c)**Assume that $r>q$, $c>0$ and $r>c\ge r-q$. Let $\mathcal{G}$ be a set of couples $\left(\mathbf{S}\right(L),\mathbf{B}(L\left)\right)$, where:

- (i)
- The matrix $\mathbf{B}\left(L\right)$ has the parameterization$$\mathbf{B}\left(L\right)=\mathit{\xi}{\mathit{\eta}}^{\prime}+(1-L){\mathbf{B}}^{*}+{(1-L)}^{2}\mathbf{E}\left(L\right),$$
- (ii)
- $\mathbf{S}\left(L\right)$ is an $r\times r$ polynomial matrix of degree ${s}_{2}\ge 0$. $\mathbf{S}\left(0\right)={\mathbf{I}}_{r}$.
- (iii)
- Denoting by $\mathbf{p}$ the vector containing the $\lambda =(r-c)(r+q)+rq({s}_{1}+2)+{r}^{2}{s}_{2}$ coefficients of the matrices $\mathbf{S}\left(L\right)$, $\mathit{\xi}$, $\mathit{\eta}$, ${\mathbf{B}}^{*}$ and $\mathbf{E}\left(L\right)$, we assume that $\mathbf{p}\in \Pi $, where Π is an open subset of ${\mathbb{R}}^{\lambda}$ such that for $\mathbf{p}\in \Pi $:(1)$\mathbf{S}\left(z\right)$ is stable,(2)$\mathrm{rank}\left(\mathbf{B}\right(z\left)\right)=q$ with the exception of a finite subset of $\mathbb{C}$,(3)$\mathrm{rank}\left(\mathbf{B}\left(1\right)\right)=\mathrm{rank}\left(\mathit{\xi}{\mathit{\eta}}^{\prime}\right)=r-c$.

We say that $\mathcal{G}$ is a rational reduced-rank family of filters with cointegrating rank c.

**Proposition**

**3.**

Assume that $r>q$. Let ${\mathbf{y}}_{t}$ be a $I\left(1\right)$ solution of Equation (10), where $\left(\mathbf{S}\right(L),\mathbf{B}(L\left)\right)$ belongs to a rational reduced-rank family of filters with cointegrating rank c. For generic values of the parameters in Π, Assumptions 1, 3, 4, 5 and 6 hold. Thus the Strong Form of Proposition 2 holds and ${\mathbf{y}}_{t}$ has an error correction representation
$$\mathbf{A}\left(L\right){\mathbf{y}}_{t}={\mathbf{A}}^{*}\left(L\right)(1-L){\mathbf{y}}_{t}+\mathit{\alpha}({\mathit{\beta}}^{\prime}{\mathbf{y}}_{t-1}-\mathbf{w})=\mathbf{B}\left(0\right){\mathbf{u}}_{t},$$
where $\mathbf{A}\left(L\right)$ is a finite-degree polynomial matrix.

**Proof.**

Part (iii) of Definition 5 implies that Assumptions 1 and 4 hold for all $\mathbf{p}\in \Pi $. The sets where Assumptions 5 and 6 do not hold are the intersections of the open set $\Pi $ with the algebraic varieties
(the variety described by (a) is obtained by equating to zero the determinant of all the $q\times q$ submatrices of the $r\times q$ matrix between brackets). It is easy to see that the varieties (a) and (b) are not trivial, i.e., that their dimension is lower than $\lambda $. Thus Assumptions 5 and 6 hold generically. The same result holds for Assumption 3. The points of $\Pi $ where it is not fulfilled belong to a lower-dimensional algebraic variety. This is proved in Appendix A.1, see in particular Lemma A4. □

$$\left(\mathrm{a}\right)\phantom{\rule{4pt}{0ex}}\mathrm{rank}\left[\left(\begin{array}{c}{\xi}_{\perp}^{\prime}{\mathbf{B}}^{*}\\ {\eta}^{\prime}\end{array}\right)\right]<q,\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\left(\mathrm{b}\right)\phantom{\rule{4pt}{0ex}}{\xi}_{\perp}^{\prime}({\mathbf{B}}^{*}-{\mathbf{S}}^{*}\left(1\right)\mathbf{S}{\left(1\right)}^{-1}\mathit{\xi}{\mathit{\eta}}^{\prime})=\mathbf{0}$$

**Remark**

**7.**

It is easy to see that, assuming that $c\le q$, $\mathrm{rank}({\xi}_{\perp}^{\prime}({\mathbf{B}}^{*}-{\mathbf{S}}^{*}\left(1\right)\mathbf{S}{\left(1\right)}^{-1}\mathit{\xi}{\mathit{\eta}}^{\prime})=c$ holds generically in Π. Thus, in that case, the error term $\mathit{\beta}{\mathbf{y}}_{t}-\mathbf{w}$ is generically a non-cointegrated $I\left(0\right)$ process, see Remark 6.

**Remark**

**8.**

A general comment on genericity results is in order. Theorems like Proposition 3 or Proposition 1, part (II), show that the subset where some statement does not hold belong to some algebraic variety of lower dimension (see the proof of Proposition 3 in particular), and is therefore negligible from a topological point of view. This suggests the working hypothesis that such subset is negligible from an economic or statistical point of view as well. If, for example, economic theory produces a singular vector ${\mathbf{y}}_{t}$ with cointegrationg rank c, we may find reasonable to conclude that ${\mathbf{y}}_{t}$ has representation (14) with a finite autoregressive polynomial. However, a greater degree of certainty is obtained by checking that the parameters of $\left(\mathbf{S}\right(L),\mathbf{B}(L\left)\right)$, that are implicit in the theory, do not necessarily lie in one of the three algebraic varieties described in the proof of Proposition 3.

Definition 5 does not assume that $\mathbf{B}\left(L\right)$ has no zeros inside the unit circle. Thus we have not assumed that ${\mathbf{u}}_{t}$ is fundamental for $(1-L){\mathbf{F}}_{t}$, see Section 2.2. However, Proposition 3 shows that for generic values of the parameters in $\Pi $, the assumptions of Proposition 2, strong form, hold, Assumption 3 in particular, so that $\mathbf{B}\left(L\right)$ has no zeros of non-unit modulus and therefore inside the unit circle. Thus:

**Proposition**

**4.**

Assume that $r>q$. Let ${\mathbf{y}}_{t}$ be a solution of Equation (10), where $\left(\mathbf{S}\right(L),\mathbf{B}(L\left)\right)$ belongs to a rational reduced-rank family of filters with cointegrating rank c. For generic values of the parameters in Π, ${\mathbf{u}}_{t}$ is fundamental for $(1-L){\mathbf{y}}_{t}$.

**Remark**

**9.**

Note that Propositions 3 and 4 do not hold in the nonsingular case, where no genericity argument can be used to rule out non-unit zeros of $\mathbf{B}\left(L\right)$, either inside or outside the unit circle. In particular, fundamentalness of ${\mathbf{u}}_{t}$ for $(1-L){\mathbf{y}}_{t}$ is not generic if $r=q$.

#### 3.3. Permanent and Transitory Shocks

Let ${\mathit{\eta}}_{\perp}$ be a $q\times d$ matrix whose columns are independent and orthogonal to the columns of $\mathit{\eta}$, and let

$$\overline{\mathit{\eta}}=\mathit{\eta}{\left({\mathit{\eta}}^{\prime}\mathit{\eta}\right)}^{-1},\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}{\overline{\mathit{\eta}}}_{\perp}={\mathit{\eta}}_{\perp}{\left({\mathit{\eta}}_{\perp}^{\prime}{\mathit{\eta}}_{\perp}\right)}^{-1}.$$

Defining ${\mathbf{v}}_{1t}={\mathit{\eta}}_{\perp}^{\prime}{\mathbf{u}}_{t}$, and ${\mathbf{v}}_{2t}={\mathit{\eta}}^{\prime}{\mathbf{u}}_{t}$, we have

$${\mathbf{u}}_{t}={\overline{\mathit{\eta}}}_{\perp}{\mathbf{v}}_{1t}+\overline{\mathit{\eta}}{\mathbf{v}}_{2t}=\left(\begin{array}{cc}{\overline{\mathit{\eta}}}_{\perp}& \overline{\mathit{\eta}}\end{array}\right)\left(\begin{array}{c}{\mathbf{v}}_{1t}\\ {\mathbf{v}}_{2t}\end{array}\right)$$

We have
where ${\mathbf{G}}_{1}\left(L\right)=\left({\mathbf{B}}^{*}+(1-L)\mathbf{E}\left(L\right)\right){\overline{\mathit{\eta}}}_{\perp}$, and ${\mathbf{G}}_{2}\left(L\right)=\left({\mathbf{B}}^{*}+(1-L)\mathbf{E}\left(L\right)\right)\overline{\mathit{\eta}}$. All the solutions of the difference equation $(1-L){\mathbf{y}}_{t}=\mathbf{S}{\left(L\right)}^{-1}\mathbf{C}\left(L\right){\mathbf{u}}_{t}$ are
where $\mathbf{K}$ is a constant stochastic process, and

$$\mathbf{B}\left(L\right){\mathbf{u}}_{t}=\left[\mathbf{B}\left(L\right)\left({\overline{\mathit{\eta}}}_{\perp}\phantom{\rule{4pt}{0ex}}\overline{\mathit{\eta}}\right)\right]\left(\begin{array}{c}{\mathbf{v}}_{1t}\\ {\mathbf{v}}_{2t}\end{array}\right)=(1-L){\mathbf{G}}_{1}\left(L\right){\mathbf{v}}_{1t}+\left(\mathit{\xi}+(1-L){\mathbf{G}}_{2}\left(L\right)\right){\mathbf{v}}_{2t}.$$

$${\mathbf{y}}_{t}=\mathbf{S}{\left(L\right)}^{-1}\left[{\mathbf{G}}_{1}\left(L\right){\mathbf{v}}_{1t}+{\mathbf{G}}_{2}\left(L\right){\mathbf{v}}_{2t}+{\mathbf{T}}_{t}\right]+\mathbf{K},$$

$${\mathbf{T}}_{t}=\left\{\begin{array}{c}\mathit{\xi}({\mathbf{v}}_{21}+{\mathbf{v}}_{22}+\cdots +{\mathbf{v}}_{2t}),\phantom{\rule{4.pt}{0ex}}\mathrm{for}\phantom{\rule{4.pt}{0ex}}t>0\hfill \\ \mathbf{0},\phantom{\rule{4.pt}{0ex}}\mathrm{for}\phantom{\rule{4.pt}{0ex}}t=0\hfill \\ -\mathit{\xi}({\mathbf{v}}_{20}+{\mathbf{v}}_{2,-1}+\cdots +{\mathbf{v}}_{2,t+1}),\phantom{\rule{4.pt}{0ex}}\mathrm{for}\phantom{\rule{4.pt}{0ex}}t<0.\hfill \end{array}\right.$$

As $\mathit{\xi}$ is full rank, we see that ${\mathbf{y}}_{t}$ is driven by the $q-d=r-c$ permanent shocks ${\mathbf{v}}_{2t}$, and by the d temporary shocks ${\mathbf{v}}_{1t}$. In representation (21), the component ${\mathbf{T}}_{t}$ is the common-trend of Stock and Watson (1988). Note that the number of permanent shocks is obtained as r minus the cointegrating rank, as usual. However, the number of transitory shocks is only $d=c-(r-q)$, as though $r-q$ transitory shocks had a zero coefficient.

#### 3.4. VECMs and Unrestricted VARs in The Levels

Several papers have addressed the issue of whether and when an error correction model or an unrestricted VAR in the levels should be used for estimation in the case of nonsingular cointegrated vectors: Sims et al. (1990) have shown that the parameters of a cointegrated VAR are consistently estimated using an unrestricted VAR in the levels; on the other hand, Phillips (1998) shows that if the variables are cointegrated, the long-run features of the impulse-response functions are consistently estimated only if the unit roots are explicitly taken into account, that is within a VECM specification. The simulation exercise described below provides evidence in favour of the VECM specification in the singular case.

- (I)
- We generate ${\mathbf{y}}_{t}$ using a specification of (14) with $r=4$, $q=3$, $d=2$, so that $c=r-q+d=3$. The $4\times 4$ matrix $\mathbf{A}\left(L\right)$ is of degree 2. The impulse-response functions are identified by assuming that the upper $3\times 3$ submatrix of $\mathbf{B}\left(0\right)$ is lower triangular (see Appendix C for details). We replicate the generation of ${\mathbf{y}}_{t}$ 1000 times for $T=100,\phantom{\rule{4pt}{0ex}}500,\phantom{\rule{4pt}{0ex}}1000,\phantom{\rule{4pt}{0ex}}5000.$
- (II)
- For each replication, we estimate a (misspecified) VAR in differences (DVAR), a VAR in the levels (LVAR) and a VECM, as in Johansen (1988, 1991), assuming known c, the degree of $\mathbf{A}\left(L\right)$ and that of ${\mathbf{A}}^{*}\left(L\right)$. For the VAR in differences the impulse-response functions for $(1-L){\mathbf{y}}_{t}$ are cumulated to obtain impulse-response function for ${\mathbf{y}}_{t}$. The root mean square error between estimated and actual impulse-response functions is computed for each replication using all 12 impulse-responses and averaged over all replications.

The results are shown in Table 1. We see that the RMSE of both the VECM and the LVAR decreases as T increases. However, for all values of T, the RMSE of the VECM stabilizes as the lag increases, whereas it deteriorates for the LVAR, in line with the claim that the long-run response of the variables are better estimated with the VECM. The performance of the misspecified DVAR is uniformly poor with the exception of lag zero.

## 4. Cointegration of the Observable Variables in a DFM

Consider again the factor model ${x}_{it}={\chi}_{it}+{\u03f5}_{it}$, rewritten here as
where $\mathbf{\Lambda}$ is $n\times r$, with $n>r$. The relationship between cointegration of the factors ${\mathbf{F}}_{t}$ and cointegration of the variables ${x}_{it}$ is now considered.

$${\mathbf{x}}_{t}={\mathit{\chi}}_{t}+{\mathit{\u03f5}}_{t},\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}{\mathit{\chi}}_{t}=\mathbf{\Lambda}{\mathbf{F}}_{t},$$

Let us recall that the the common factors ${\mathbf{F}}_{jt}$ are assumed to be orthogonal to the idiosyncratic components ${\u03f5}_{ks}$ for all $i,j,t,s$, i.e., $\mathrm{E}{\mathit{\chi}}_{t}{\mathit{\u03f5}}_{s}^{\prime}={\mathbf{0}}_{n\times n}.$ for all $t,s$, see the Introduction. The other assumptions on model (22) are asymptotic, see e.g., Forni et al. (2000); Forni and Lippi (2001); (Stock and Watson 2002a, 2002b), and put no restriction on the matrix $\mathbf{\Lambda}$ and the vector ${\u03f5}_{t}$ for a given finite n. In particular, the first r eigenvalues of the matrix $\mathbf{\Lambda}{\mathbf{\Lambda}}^{\prime}$ must diverge as $n\to \infty $, but this has no implications on the rank of the matrix $\mathbf{\Lambda}$ corresponding to, say, $n=10$. However, as we see in Proposition 5 (iii), if the idiosyncratic components are $I\left(0\right)$, then, independently of $\mathbf{\Lambda}$, all p-dimensional subvectors of ${\mathbf{x}}_{t}$ are cointegrated for $p>q-d$, which is at odds with what is observed in the macroeconomic datasets analyzed in the empirical Dynamic Factor Model literature. This motivates assuming that ${\mathit{\u03f5}}_{t}$ is $I\left(1\right)$. In that case, see Proposition 5 (i), cointegration of ${\mathbf{x}}_{t}$ requires that both the common and the idiosyncratic components are cointegrated. Some results are collected in the statement below.

**Proposition**

**5.**

Let ${\mathbf{x}}_{t}^{\left(p\right)}={\mathit{\chi}}_{t}^{\left(p\right)}+{\mathit{\u03f5}}_{t}^{\left(p\right)}={\mathbf{\Lambda}}^{\left(p\right)}{\mathbf{F}}_{t}+{\mathit{\u03f5}}_{t}^{\left(p\right)}$ be a p-dimensional subvector of ${\mathbf{x}}_{t}$, $p\le n$. Denote by ${c}_{\chi}^{p}$ and ${c}_{\u03f5}^{p}$ the cointegrating rank of ${\mathit{\chi}}_{t}^{\left(p\right)}$ and ${\mathit{\u03f5}}_{t}^{\left(p\right)}$ respectively. Both range from p, stationarity, to 0, no cointegration.

- (i)
- ${\mathbf{x}}_{t}^{\left(p\right)}$ is cointegrated only if ${\mathit{\chi}}_{t}^{\left(p\right)}$ and ${\mathit{\u03f5}}_{t}^{\left(p\right)}$ are both cointegrated.
- (ii)
- If $p>q-d$ then ${\mathit{\chi}}_{t}^{\left(p\right)}$ is cointegrated. If $p\le q-d$ and rank$\left({\mathbf{\Lambda}}^{\left(p\right)}\right)<p$ then ${\mathit{\chi}}_{t}^{\left(p\right)}$ is cointegrated.
- (iii)
- Let ${V}^{\chi}\subseteq {\mathbb{R}}^{p}$ and ${V}^{\u03f5}\subseteq {\mathbb{R}}^{p}$ be the cointegrating spaces of ${\mathit{\chi}}_{t}^{\left(p\right)}$ and ${\mathit{\u03f5}}_{t}^{\left(p\right)}$ respectively. The vector ${\mathbf{x}}_{t}^{\left(p\right)}$ is cointegrated if and only if the intersection of ${V}^{\chi}$ and ${V}^{\u03f5}$ contains non-zero vectors. In particular, (a) if $p>q-d$ and ${c}^{\u03f5}>q-d$ then ${\mathbf{x}}^{\left(p\right)}$ is cointegrated, (b) if $p>q-d$ and ${\mathit{\u03f5}}_{t}^{\left(p\right)}$ is stationary then ${\mathbf{x}}^{\left(p\right)}$ is cointegrated.

**Proof.**

Because ${\chi}_{it}$ and ${\u03f5}_{js}$ are orthogonal for all $i,j,t,s$, the spectral densities of $(1-L){\mathbf{x}}_{t}^{\left(p\right)}$, $(1-L){\chi}_{t}^{\left(p\right)}$, $(1-L){\u03f5}_{t}^{\left(p\right)}$ fulfill:

$${\mathbf{\Sigma}}_{\Delta x}^{\left(p\right)}\left(\theta \right)={\mathbf{\Sigma}}_{\Delta \chi}^{\left(p\right)}\left(\theta \right)+{\mathbf{\Sigma}}_{\Delta \u03f5}^{\left(p\right)}\left(\theta \right)\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\theta \in [-\pi ,\pi ].$$

Now, (23) implies that
where ${\lambda}_{p}\left(A\right)$ denotes the smallest eigenvalue of the hermitian matrix A; this is one of the Weyl’s inequalities, see Franklin (2000), p. 157, Theorem 1. Because the spectral density matrices are non-negative definite, the right hand side in (24) vanishes if and only if both terms on the right hand side vanish, i.e., the spectral density of $\Delta {\mathbf{x}}_{t}^{\left(p\right)}$ is singular at zero if and only if the spectral densities of $\Delta {\mathit{\chi}}_{t}^{\left(p\right)}$ and $\Delta {\mathit{\u03f5}}_{t}^{\left(p\right)}$ are singular at zero. By definition 4, (i) is proved.

$${\lambda}_{p}\left({\mathbf{\Sigma}}_{\Delta x}^{\left(p\right)}\left(0\right)\right)\ge {\lambda}_{p}\left({\mathbf{\Sigma}}_{\Delta \chi}^{\left(p\right)}\left(0\right)\right)+{\lambda}^{\left(p\right)}\left({\mathbf{\Sigma}}_{\Delta \u03f5}^{\left(p\right)}\left(0\right)\right),$$

Without loss of generality we can assume that $\mathbf{S}\left(L\right)={\mathbf{I}}_{r}$. By substituting (21) in (22), we obtain
where on the right hand side the only non-stationary terms are ${\mathbf{T}}_{t}$ and possibly ${\mathit{\u03f5}}_{t}$. By recalling that ${\mathbf{T}}_{t}=\mathit{\xi}{\sum}_{s=1}^{t}{\mathbf{v}}_{2s}$ where $\mathit{\xi}$ is of dimension $r\times (q-d)$ and rank $q-d$, and by defining ${G}_{t}=\mathbf{\Lambda}[{\mathbf{G}}_{1}\left(L\right){\mathbf{v}}_{1t}+{\mathbf{G}}_{2}\left(L\right){\mathbf{v}}_{2t}+\mathbf{K}]$ and ${T}_{t}={\sum}_{s=1}^{t}{\mathbf{v}}_{2s}$, we can rewrite (25) as

$${\mathbf{x}}_{t}=\mathbf{\Lambda}\left[\left({\mathbf{G}}_{1}\left(L\right){\mathbf{v}}_{1t}+{\mathbf{G}}_{2}\left(L\right){\mathbf{v}}_{2t}+{\mathbf{T}}_{t}\right)+\mathbf{K}\right]+{\mathit{\u03f5}}_{t},$$

$${\mathbf{x}}_{t}=\mathbf{\Lambda}\mathit{\xi}{T}_{t}+{G}_{t}+{\mathit{\u03f5}}_{t}.$$

For ${\mathbf{x}}_{t}^{\left(p\right)}$:
where ${\mathbf{\Lambda}}^{\left(p\right)}$ and ${G}_{t}^{\left(p\right)}$ have an obvious definition. Of course cointegration of the common components ${\chi}_{t}^{\left(p\right)}$ is equivalent to cointegration of ${\mathbf{\Lambda}}^{\left(p\right)}\mathit{\xi}{T}_{t}$, which in turn is equivalent to rank$\left({\mathbf{\Lambda}}^{\left(p\right)}\mathit{\xi}\right)<p$. Statement (ii) follows from

$${\mathbf{x}}_{t}^{\left(p\right)}={\chi}_{t}^{\left(p\right)}+{\mathit{\u03f5}}_{t}^{\left(p\right)}={\mathbf{\Lambda}}^{\left(p\right)}\mathit{\xi}{T}_{t}+{G}_{t}^{\left(p\right)}+{\mathit{\u03f5}}_{t}^{\left(p\right)},$$

$$\mathrm{rank}\left({\mathbf{\Lambda}}^{\left(p\right)}\mathit{\xi}\right)\le min\left(\mathrm{rank}\left({\mathbf{\Lambda}}^{\left(p\right)}\right),\mathrm{rank}\left(\mathit{\xi}\right)\right).$$

The first part of (iii) is obvious. Assume now that $p>q-d$. If ${c}_{\chi}^{p}+{c}_{\u03f5}^{p}=\mathrm{dim}\left({V}^{\chi}\right)+\mathrm{dim}\left({V}^{\u03f5}\right)=p-(q-d)+{c}_{\u03f5}^{p}>p$, i.e., if ${c}_{\u03f5}^{p}>q-d$, then the intersection between ${V}^{\chi}$ and ${V}^{\u03f5}$ is non-trivial, so that ${\mathbf{x}}_{t}^{\left(p\right)}$ is cointegrated. □

## 5. Summary and Conclusions

The paper studies representation theory for singular $I\left(1\right)$ stochastic vectors, the factors of an $I\left(1\right)$ Dynamic Factor Model in particular. Singular $I\left(1\right)$ vectors are cointegrated, with a cointegrating rank c equal to $r-q$, the dimension of ${\mathbf{y}}_{t}$ minus its rank, plus $d,$ with $0\le d<q$.

If $(1-L){\mathbf{y}}_{t}$ has rational spectral density, under assumptions that generalize to the singular case those in Johansen (1995), we show that ${\mathbf{y}}_{t}$ has an error correction representation with c error terms, thus generalizing the Granger representation theorem (from MA to AR) to the singular case. Important consequences of singularity are that generically: (i) the autoregressive matrix polynomial of the error correction representation is of finite degree, (ii) the white noise vector driving $(1-L){\mathbf{y}}_{t}$ is fundamental.

We find that ${\mathbf{y}}_{t}$ is driven by $r-c$ permanent shocks and $d=c-(r-q)$ transitory shocks, not c as in the nonsingular case.

Using simulated data generated by a simple singular VECM, confirms previous results, obtained for nonsingular vectors, showing that under cointegration the long-run features of impulse-response functions are better estimated using a VECM rather than a VAR in the levels.

In Section 4 we argue that stationarity of the idiosyncratic components in a DFM produce an amount of cointegration for the observable variables ${x}_{it}$ that is not observed in the datasets that are standard in applied Dynamic Factor Model literature. Thus the idiosyncratic vector in those datasets is likely to be $I\left(1\right)$, so that an estimation strategy robust to the assumption that some of the idiosyncratic variables ${\u03f5}_{it}$ are $I\left(1\right)$ should be preferred.

The results in this paper are the basis for estimation of $I\left(1\right)$ Dynamic Factor Models with cointegrated factors, which is developed in the companion paper (Barigozzi et al. 2019).

## Author Contributions

All authors contributed equally to the paper. All authors have read and agreed to the published version of the manuscript.

## Funding

This research received no external funding.

## Acknowledgments

Dietmar Bauer, Manfred Deistler, Massimo Franchi, Martin Wagner, three anonymous referees and the Editors of this Special Issue gave important suggestions for improvements. We also thank the participants to the Workshop on Estimation and Inference Theory for Cointegrated Processes in the State Space Representation, Technische Universität Dortmund, January 2016. Part of this paper was written while Matteo Luciani was chargé de recherches F.R.S.- F.N.R.S., and he gratefully acknowledges their financial support. Of course we are responsible for any remaining errors.

## Disclaimer

The views expressed in this paper are those of the authors and do not necessarily reflect those of the Board of Governors or the Federal Reserve System.

## Conflicts of Interest

The authors declare no conflict of interest.

## Appendix A. Proofs

#### Appendix A.1. Assumption 3 Holds Generically

Proving that Assumption 3 holds generically is equivalent to proving that $\mathbf{M}\left(z\right)$ is generically zeroless, see the argument below Equation (20).

We need some preliminary results. Lemma A1, though quite easy, is not completely standard and is therefore carefully stated and proved below. Regarding notation, to avoid possible misunderstandings, let us recall that vectors and matrices are always denoted by boldface symbols, while light symbols denote scalars, see Lemmas A1 and A2 in particular.

**Lemma**

**A1.**

Let ${A}_{j}$, $j=1,\dots ,s$, be scalar polynomials defined on ${\mathbb{R}}^{\lambda}$, let $\mathbf{p}\in {\mathbb{R}}^{\lambda}$ and $Q\left(\mathbf{p}\right)$ be the statement
for example the statement that all the $q\times q$ minors of $\mathbf{M}\left(1\right)$ vanish, i.e., that $\mathrm{rank}\left(\mathbf{M}\right(1\left)\right)<q$. Let Π be an open subset of ${\mathbb{R}}^{\lambda}$. If Q is false for one point ${\mathbf{p}}^{*}\in {\mathbb{R}}^{\lambda}$, then Q is generically false in Π.

$${A}_{j}\left(\mathbf{p}\right)=0,\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\mathrm{for}\phantom{\rule{4pt}{0ex}}j=1,\dots ,s,$$

**Proof.**

Let $\mathcal{N}$ be the closure in $\Pi $ (in the topology of $\Pi $) of the subset of $\Pi $ where Q is true. Suppose that Q is not generically false in $\Pi $. Then the interior of $\mathcal{N}$ in $\Pi $, call it ${\mathcal{N}}^{\circ}$, is not empty. As $\Pi $ is open, ${\mathcal{N}}^{\circ}$ is open both in the topology of $\Pi $ and of ${\mathbb{R}}^{\lambda}$. On the other hand a polynomial function defined on ${\mathbb{R}}^{\lambda}$ vanishes on an open set if and only if it vanishes on the whole ${\mathbb{R}}^{\lambda}$, which contradicts the existence of a point in ${\mathbb{R}}^{\lambda}$ where Q is false. □

**Lemma**

**A2.**

Consider the scalar polynomials
with ${a}_{0}\ne 0$ and ${b}_{0}\ne 0$, and let ${\alpha}_{i}$, $i=1,\dots ,n$ and ${\beta}_{j}$, $j=1,\dots ,m$, be the roots of A and B, respectively. Then: (i)
where R is a polynomial function which is called the resultant of A and B. (ii) The resultant vanishes if and only if A and B have a common root. (iii) Suppose that the coefficients ${a}_{i}$ and ${b}_{j}$ are polynomial functions of $\mathbf{p}\in \Pi $, where Π is an open subset of ${\mathbb{R}}^{\lambda}$. If there exists a point ${\mathbf{p}}^{*}\in {\mathbb{R}}^{\lambda}$ such that ${a}_{0}\left({\mathbf{p}}^{*}\right)\ne 0$, ${b}_{0}\left({\mathbf{p}}^{*}\right)\ne 0$, and $R\left({\mathbf{p}}^{*}\right)\ne 0$, then generically in Π the polynomials A and B have no common roots.

$$A\left(z\right)={a}_{0}{z}^{n}+{a}_{1}{z}^{n-1}+\cdots +{a}_{n},\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}B\left(z\right)={b}_{0}{z}^{m}+{b}_{1}{z}^{m-1}+\cdots +{a}_{m},$$

$${a}_{0}^{m}{b}_{0}^{n}\prod _{i,j}({\alpha}_{i}-{\beta}_{j})=R({a}_{0},{a}_{1},\dots ,{a}_{n};{b}_{0},{b}_{1},\dots ,{b}_{m}),$$

**Proof.**

For (i) and (ii) see van der Waerden (1953, pp. 83-8). Statement (iii) is an obvious consequence of (ii) and Lemma A1. □

**Lemma**

**A3.**

Recall that a zero of $\mathbf{M}\left(z\right)$ is a complex number ${z}^{*}$ such that $\mathrm{rank}\left(\mathbf{M}\left({z}^{*}\right)\right)<q$. If $\mathbf{M}\left(z\right)$ has two $q\times q$ submatrices whose determinants have no common roots, then $\mathbf{M}\left(z\right)$ is zeroless.

**Proof.**

If ${z}^{*}$ is a zero of $\mathbf{M}\left(z\right)$, then ${z}^{*}$ is a zero of all the $q\times q$ submatrices of $\mathbf{M}\left(z\right)$. □

For the statement and proof of our last result it is convenient to make explicit the dependence of the matrix $\mathbf{M}\left(z\right)$ and its submatrices on the vector $\mathbf{p}$. Thus we use ${\mathbf{M}}^{\mathbf{p}}\left(z\right)$, etc. The parameters of the matrix $\mathbf{S}\left(L\right)$ play no role here. Hence, with no loss of generality, we assume ${s}_{2}=0$, so that $\lambda =(r-c)(r+q)+rq({s}_{1}+2).$ Lemmas A2–A4 below imply that Assumption 3 holds generically in $\Pi $.

**Lemma**

**A4.**

Let ${\mathbf{M}}_{1}^{\mathbf{p}}\left(z\right),\phantom{\rule{4pt}{0ex}}{\mathbf{M}}_{2}^{\mathbf{p}}\left(z\right),\phantom{\rule{4pt}{0ex}}\dots $ be all the $q\times q$ submatrices of ${\mathbf{M}}^{\mathbf{p}}\left(z\right)$ and let ${\mathcal{L}}_{i}^{\mathbf{p}}$ be the leading coefficient of $det{\mathbf{M}}_{i}^{\mathbf{p}}\left(z\right)$ and ${R}_{ij}^{\mathbf{p}}$ is the resultant of $det{\mathbf{M}}_{i}^{\mathbf{p}}\left(z\right)$ and $det{\mathbf{M}}_{j}^{\mathbf{p}}\left(z\right)$. There exist i, j, ${\mathbf{p}}^{*}\in {\mathbb{R}}^{\lambda}$ such that

$${\mathcal{L}}_{i}^{{\mathbf{p}}^{*}}{\mathcal{L}}_{j}^{{\mathbf{p}}^{*}}\ne 0$$

and

$${R}_{ij}^{{\mathbf{p}}^{*}}\ne 0.$$

**Proof.**

Assume that $r=q+1$. To each $\mathbf{p}\in \Pi $ there corresponds the matrix
□

$${\mathbf{M}}^{\mathbf{p}}\left(z\right)=\left(\begin{array}{c}{\mathit{\xi}}_{\perp}^{\prime}{\mathbf{B}}^{*}\\ {\mathit{\xi}}^{\prime}\mathit{\xi}{\mathit{\eta}}^{\prime}\end{array}\right)+(1-z)\left(\begin{array}{c}{\mathit{\xi}}_{\perp}^{\prime}\mathbf{E}\left(z\right)\\ {\mathit{\xi}}^{\prime}{\mathbf{B}}^{*}\end{array}\right)+{(1-z)}^{2}\left(\begin{array}{c}{\mathbf{0}}_{c\times q}\\ {\mathit{\xi}}^{\prime}\mathbf{E}\left(z\right)\end{array}\right).$$

Of course, the definition of ${\mathbf{M}}^{\mathbf{p}}\left(z\right)$ makes sense for all $\mathbf{p}\in {\mathbb{R}}^{\lambda}$, see Equation (19). Let ${\mathbf{M}}_{1}^{\mathbf{p}}\left(z\right)$ and ${\mathbf{M}}_{2}^{\mathbf{p}}\left(z\right)$ be the matrices obtained from ${\mathbf{M}}^{\mathbf{p}}\left(z\right)$ by removing the first and the last row respectively. We have:

$$\begin{array}{cc}\hfill \mathrm{degree}[det\left({\mathbf{M}}_{1}^{\mathbf{p}}\left(z\right)\right)]& \le (q-d)({s}_{1}+2)+d({s}_{1}+1)={d}_{1},\hfill \\ \hfill \mathrm{degree}[det\left({\mathbf{M}}_{2}^{\mathbf{p}}\left(z\right)\right)]& \le (q-d-1)({s}_{1}+2)+(d+1)({s}_{1}+1)={d}_{2}.\hfill \end{array}$$

We will construct a point ${\mathbf{p}}^{*}\in {\mathbb{R}}^{\lambda}$ such that: (A) the coefficient of ${z}^{{d}_{1}}$ in $det\left({\mathbf{M}}_{1}^{{\mathbf{p}}^{*}}\left(z\right)\right)$ and the coefficient of ${z}^{{d}_{2}}$ in $det\left({\mathbf{M}}_{2}^{{\mathbf{p}}^{*}}\left(z\right)\right)$ (the leading coefficients) do not vanish, (B) the resultant of $det\left({\mathbf{M}}_{1}^{{\mathbf{p}}^{*}}\left(z\right)\right)$ and $det\left({\mathbf{M}}_{2}^{{\mathbf{p}}^{*}}\left(z\right)\right)$ does not vanish.

Let us firstly define a family of matrices, denoted by $\underline{\mathbf{M}}\left(z\right)$, obtained by specifying $\eta $, $\mathit{\xi}$, ${\mathit{\xi}}_{\perp}^{\prime}$, ${\mathbf{B}}^{*}$ and $\mathbf{E}\left(z\right)$ in the following way:
where:
the entries e, ${k}_{i}$, ${h}_{i}$, ${f}_{i}$ and ${g}_{i}$ being scalar polynomials of degree ${s}_{1}$.

$$\begin{array}{cc}\hfill {\underline{\mathit{\eta}}}^{\prime}& =\left(\begin{array}{cc}{\mathbf{0}}_{(q-d)\times d}& {\mathbf{I}}_{q-d}\end{array}\right),\phantom{\rule{4pt}{0ex}}\underline{\mathit{\xi}}=\left(\begin{array}{c}{\mathbf{I}}_{q-d}\\ {\mathbf{0}}_{c\times (q-d)}\end{array}\right),\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}{\underline{\mathit{\xi}}}_{\perp}^{\prime}=\left(\begin{array}{c}\mathbf{K}\\ \mathbf{H}\end{array}\right),\hfill \\ \hfill {\underline{\mathbf{B}}}^{*}& =\left(\begin{array}{cc}{\mathbf{H}}^{\prime}& {\mathbf{0}}_{(q+1)\times (q-d)}\end{array}\right),\phantom{\rule{4pt}{0ex}}\underline{\mathbf{E}}\left(z\right)=\left(\begin{array}{c}{\mathbf{E}}_{1}\left(z\right)\\ {\mathbf{E}}_{2}\left(z\right)\\ {\mathbf{E}}_{3}\left(z\right)\end{array}\right),\hfill \end{array}$$

$$\begin{array}{cc}\hfill \mathbf{K}& =\left(\begin{array}{ccc}{\mathbf{0}}_{1\times (q-d)}& 1& {\mathbf{0}}_{1\times d}\end{array}\right),\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\mathbf{H}=\left(\begin{array}{cc}{\mathbf{0}}_{d\times (q+1-d)}& {\mathbf{I}}_{d}\end{array}\right),\hfill \\ \hfill {\mathbf{E}}_{1}\left(z\right)& =\left(\begin{array}{cccc}{k}_{1}\left(z\right)& {h}_{1}\left(z\right)& \cdots & 0\\ {\mathbf{0}}_{(q-d)\times d}& \phantom{\rule{-25.0pt}{0ex}}\ddots & \phantom{\rule{-25.0pt}{0ex}}\ddots \\ \phantom{\rule{-29.0pt}{0ex}}\ddots & {h}_{q-d-1}\left(z\right)\\ 0& \cdots & {k}_{q-d}\left(z\right)\end{array}\right),\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}{\mathbf{E}}_{2}\left(z\right)=\left(\begin{array}{cc}e\left(z\right)& {\mathbf{0}}_{1\times (q-1)}\end{array}\right),\hfill \\ \hfill {\mathbf{E}}_{3}\left(z\right)& =\left(\begin{array}{cccc}{f}_{1}\left(z\right)& {g}_{1}\left(z\right)& \cdots & 0\\ \ddots & \ddots & {\mathbf{0}}_{d\times (q-d-1)}\\ 0& \cdots & {f}_{d}\left(z\right)& {g}_{d}\left(z\right)\end{array}\right),\hfill \end{array}$$

We denote by ${\mathbf{q}}_{\mathbf{1}}$ the vector including the coefficients of the polynomials ${f}_{i}$, $i=1,\dots ,d$ and ${k}_{i}$, $i=1,\dots ,(q-d)$, a total of $q({s}_{1}+1)$ coefficients, by ${\mathbf{q}}_{2}$ the vector including the coefficients of the polynomials e, ${g}_{i}$, $i=1,\dots ,d$ and ${h}_{i}$, $i=1,\dots ,(q-d-1),$ a total of $q({s}_{1}+1)$ coefficients, by ${\mathbf{q}}_{0}$ the vector including the zeros and the ones in the definition of $\underline{\mathit{\xi}}$, $\underline{\mathit{\eta}}$, ${\underline{\mathbf{B}}}^{*},$$\underline{\mathbf{E}}$, and define $\mathbf{q}=\left({\mathbf{q}}_{0}\phantom{\rule{4pt}{0ex}}{\mathbf{q}}_{1}\phantom{\rule{4pt}{0ex}}{\mathbf{q}}_{2}\right)$, which is a $\lambda $-dimensional parameter vector. We put no restriction on ${\mathbf{q}}_{1}$ and ${\mathbf{q}}_{2}$, so that both can take any value in ${\mathbb{R}}^{\nu}$, with $\nu =q({s}_{1}+1)$. Note that $\mathbf{q}$ does not necessarily belong to $\Pi $. We have:

$${\underline{\mathbf{M}}}^{\mathbf{q}}\left(z\right)=\left(\begin{array}{cc}{\mathbf{0}}_{1\times d}& {\mathbf{0}}_{1\times (q-d)}\\ {\mathbf{I}}_{d}& {\mathbf{0}}_{d\times (q-d)}\\ {\mathbf{0}}_{(q-d)\times d}& {\mathbf{I}}_{q-d}\end{array}\right)+(1-z)\left(\begin{array}{c}{\mathbf{E}}_{2}\left(z\right)\\ {\mathbf{E}}_{3}\left(z\right)\\ {\mathbf{0}}_{(q-d)\times q}\end{array}\right)+{(1-z)}^{2}\left(\begin{array}{c}{\mathbf{0}}_{1\times q}\\ {\mathbf{0}}_{d\times q}\\ {\mathbf{E}}_{1}\left(z\right)\end{array}\right).$$

The matrix ${\underline{\mathbf{M}}}^{\mathbf{q}}\left(z\right)$ has zero entries except for the diagonal joining the positions $(1,1)$ and $(q,q)$, and the diagonal joining $(2,1)$ and $(q+1,q)$. The matrices ${\underline{\mathbf{M}}}_{1}^{\mathbf{q}}\left(z\right)$ and ${\underline{\mathbf{M}}}_{2}^{\mathbf{q}}\left(z\right)$ are upper- and lower-triangular, respectively, and

$$\begin{array}{cc}\hfill det\left({\underline{\mathbf{M}}}_{1}^{\mathbf{q}}\left(z\right)\right)& =[1+(1-z){f}_{1}\left(z\right)]\cdots [(1+(1-z){f}_{d}\left(z\right)]\hfill \\ \phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\times [1+{(1-z)}^{2}{k}_{1}\left(z\right)]\cdots [1+{(1-z)}^{2}{k}_{q-d}\left(z\right)]={\mathcal{L}}_{1,{d}_{1}}^{\mathbf{q}}{z}^{{d}_{1}}+\cdots +{\mathcal{L}}_{1,0}^{\mathbf{q}}\hfill \\ \hfill det\left({\underline{\mathbf{M}}}_{2}^{\mathbf{q}}\left(z\right)\right)& ={(1-z)}^{2q-d-1}e\left(z\right)[{g}_{1}\left(z\right)\cdots {g}_{d}\left(z\right)][{h}_{1}\left(z\right)\cdots {h}_{q-d-1}\left(z\right)]={\mathcal{L}}_{2,{d}_{2}}^{\mathbf{q}}{z}^{{d}_{1}}+\cdots +{\mathcal{L}}_{2,0}^{\mathbf{q}}.\hfill \end{array}$$

Note that $det\left({\underline{\mathbf{M}}}_{1}^{\mathbf{q}}\left(z\right)\right)$ does not depend on ${\mathbf{q}}_{2}$, while $det\left({\underline{\mathbf{M}}}_{2}^{\mathbf{q}}\left(z\right)\right)$ does not depend on ${\mathbf{q}}_{1}$. Thus we use the notation ${\delta}_{1}^{{\mathbf{q}}_{1}}\left(z\right)=det\left({\underline{\mathbf{M}}}_{1}^{\mathbf{q}}\left(z\right)\right)$, ${\delta}_{2}^{{\mathbf{q}}_{2}}\left(z\right)=det\left({\underline{\mathbf{M}}}_{2}^{\mathbf{q}}\left(z\right)\right)$, ${\mathcal{M}}_{1,{d}_{1}}^{{\mathbf{q}}_{1}}={\mathcal{L}}_{1,{d}_{1}}^{\mathbf{q}}$, ${\mathcal{M}}_{2,{d}_{2}}^{{\mathbf{q}}_{2}}={\mathcal{L}}_{2,{d}_{2}}^{\mathbf{q}}$. Now:

- (i)
- Let ${\mathbf{q}}_{2}^{*}\in {\mathbf{R}}^{\nu}$ be such that none of the leading coefficients of the polynomials e, ${g}_{i}$ and ${h}_{i}$ vanishes. Of course ${\mathcal{M}}_{2,{d}_{2}}^{{\mathbf{q}}_{2}^{*}}={d}_{2}\ne 0$.
- (ii)
- Let $\stackrel{\u02c7}{z}$ be a root of ${\delta}_{2}^{{\mathbf{q}}_{2}^{*}}\left(z\right)$. If $\stackrel{\u02c7}{z}=1$ then $\stackrel{\u02c7}{z}$ is not a root of ${\delta}_{1}^{{\mathbf{q}}_{1}}\left(z\right)$ for all ${\mathbf{q}}_{1}\in {\mathbb{R}}^{\nu}$. Suppose that $\stackrel{\u02c7}{z}$ is a root of ${g}_{j}\left(z\right)$, for some j. As the parameters of the polynomials ${f}_{i}$ and ${k}_{i}$ are free to vary in ${\mathbb{R}}^{\nu}$, then, generically in ${\mathbb{R}}^{\nu}$, ${\delta}_{1}^{{\mathbf{q}}_{1}}\left(\stackrel{\u02c7}{z}\right)\ne 0$. Iterating for all roots of ${\delta}_{2}^{{\mathbf{q}}_{2}^{*}}\left(z\right)$, generically in ${\mathbb{R}}^{\nu}$, ${\delta}_{1}^{{\mathbf{q}}_{1}}\left(z\right)$ and ${\delta}_{2}^{{\mathbf{q}}_{2}^{*}}\left(z\right)$ have no roots in common. Moreover, generically in ${\mathbb{R}}^{\nu}$, ${\mathcal{M}}_{1,{d}_{1}}^{{\mathbf{q}}_{1}}={d}_{1}\ne 0$. Thus, there exists ${\mathbf{q}}_{1}^{*}$ such that (a) ${\mathcal{M}}_{1,{d}_{1}}^{{\mathbf{q}}_{1}^{*}}={d}_{1}\ne 0$, (b) ${\delta}_{1}^{{\mathbf{q}}_{1}^{*}}\left(z\right)$ and ${\delta}_{2}^{{\mathbf{q}}_{2}^{*}}\left(z\right)$ have no roots in common.
- (iii)
- Now let ${\mathbf{p}}^{*}=\left({\mathbf{q}}_{0}\phantom{\rule{4pt}{0ex}}{\mathbf{q}}_{1}^{*}\phantom{\rule{4pt}{0ex}}{\mathbf{q}}_{2}^{*}\right)$, so that$$det\left({\mathbf{M}}_{1}^{{\mathbf{p}}^{*}}\left(z\right)\right)={\delta}_{1}^{{\mathbf{q}}_{1}^{*}}\left(z\right)),\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}det\left({\mathbf{M}}_{2}^{{\mathbf{p}}^{*}}\left(z\right)\right)={\delta}_{2}^{{\mathbf{q}}_{2}^{*}}\left(z\right).$$

Using (i) and (ii), (A) the leading coefficients of $det\left({\mathbf{M}}_{1}^{{\mathbf{p}}^{*}}\left(z\right)\right)$ and $det\left({\mathbf{M}}_{2}^{{\mathbf{p}}^{*}}\left(z\right)\right)$ do not vanish, (B) $det\left({\mathbf{M}}_{1}^{{\mathbf{p}}^{*}}\left(z\right)\right)$ and $det\left({\mathbf{M}}_{2}^{{\mathbf{p}}^{*}}\left(z\right)\right)$ have no root in common so that their resultant does not vanish. This proves the proposition for $r=q+1$.

Generalizing this result to $r>q+1$ is easy. Let us define the family $\underline{\mathbf{N}}\left(z\right)$ in the following way: (a) specify ${\mathit{\eta}}^{\prime}$, $\mathit{\xi}$, ${\mathbf{E}}_{1}\left(z\right)$ and ${\mathbf{E}}_{3}\left(z\right)$ as in the definition of $\underline{\mathbf{M}}\left(z\right)$, (b) then let
We have:
It is easy to see that the $(q+1)\times q$ lower submatrix of $\underline{\mathbf{N}}\left(z\right)$ is identical to the matrix ${\underline{\mathbf{M}}}^{\mathbf{q}}\left(z\right)$ in (A1).

$$\begin{array}{cc}\hfill \mathbf{K}& =\left(\begin{array}{ccc}{\mathbf{0}}_{(r-q)\times (r-d-1)}& {\left(\begin{array}{c}{\mathbf{0}}_{1\times (r-q-1)}1\end{array}\right)}^{\prime}& {\mathbf{0}}_{(r-q)\times d}\end{array}\right),\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\mathbf{H}=\left(\begin{array}{cc}{\mathbf{0}}_{d\times (r-d)}& {\mathbf{I}}_{d}\end{array}\right),\hfill \\ \hfill {\underline{\mathit{\xi}}}_{\perp}^{\prime}& =\left(\begin{array}{c}\mathbf{K}\\ \mathbf{H}\end{array}\right),\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\underline{\mathbf{D}}=\left(\begin{array}{cc}{\mathbf{H}}^{\prime}& {\mathbf{I}}_{r\times (q-d)}\end{array}\right),\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}{\mathbf{E}}_{2}\left(z\right)=\left(\begin{array}{c}{\mathbf{0}}_{(r-q)\times q}\\ \left(\begin{array}{cc}e\left(z\right)& {\mathbf{0}}_{1\times (q-1)}\end{array}\right)\end{array}\right).\hfill \end{array}$$

$$\underline{\mathbf{N}}\left(z\right)=\left(\begin{array}{cc}{\mathbf{0}}_{(r-q)\times d}& {\mathbf{0}}_{(r-q)\times (q-d)}\\ {\mathbf{I}}_{d}& {\mathbf{0}}_{d\times (q-d)}\\ {\mathbf{0}}_{(q-d)\times d}& {\mathbf{I}}_{q-d}\end{array}\right)+(1-z)\left(\begin{array}{c}{\mathbf{E}}_{2}\left(z\right)\\ {\mathbf{E}}_{3}\left(z\right)\\ {\mathbf{0}}_{(q-d)\times q}\end{array}\right)+{(1-z)}^{2}\left(\begin{array}{c}{\mathbf{0}}_{(r-q)\times q}\\ {\mathbf{0}}_{d\times q}\\ {\mathbf{E}}_{1}\left(z\right)\end{array}\right).$$

#### Appendix A.2. if R>Q and C≤Q, Assumptions 5 and 6 Do Not Imply That e t Is a Non-Cointegrated I(0) Process.

Let $r=3$, $q=2$, $\mathbf{S}\left(L\right)={\mathbf{I}}_{3}$,

$$\mathit{\xi}=\left(\begin{array}{c}1\\ 0\\ 0\end{array}\right),\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\mathit{\eta}=\left(\begin{array}{c}0\\ 1\end{array}\right),\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}{\mathit{\xi}}_{\perp}=\left(\begin{array}{cc}0& 0\\ 1& 0\\ 0& 1\end{array}\right),\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}{\mathbf{B}}^{*}=\left(\begin{array}{cc}a& b\\ 1& 0\\ 1& 0\end{array}\right).$$

In this case $c=2$ and $d=1$, so that $c=q$ (see Remark 6). We have

$$\left(\begin{array}{c}{\mathit{\xi}}_{\perp}^{\prime}{\mathbf{B}}^{*}\\ {\mathit{\eta}}^{\prime}\end{array}\right)=\left(\begin{array}{cc}1& 0\\ 1& 0\\ 0& 1\end{array}\right).$$

We see that Assumptions 5 and 6 hold. However, $\mathrm{rank}\left({\mathit{\xi}}_{\perp}^{\prime}{\mathbf{B}}^{*}\right)=1$, so that ${\mathbf{e}}_{t}$, though being $I\left(0\right)$, is not a non-cointegrated $I\left(0\right)$ process. On the other hand, if the $(3,2)$ entry of ${\mathbf{B}}^{*}$ is 1 instead of 0, ${\mathbf{e}}_{t}$ is non-cointegrated.

## Appendix B. Non Uniqueness

In Proposition 3 we prove that a singular $I\left(1\right)$ vector with cointegrating rank c has a finite error correction representation with c error terms. On the other hand, as we have seen in Remark 5, when $c=r-q$ the singular vector ${\mathbf{y}}_{t}$ has also an autoregressive representation in the differences, i.e., a representation with zero error terms. In Appendix B.1 we give an example hinting that ${\mathbf{y}}_{t}$ has error correction representations with any number of error terms between d and c. However, in Appendix B.2 we show that all such representations produce the same impulse-response functions.

#### Appendix B.1. Alternative Representations with Different Numbers of Error Terms

Let $\mathbf{S}\left(L\right)={\mathbf{I}}_{r}$ and consider the following example, with $r=3$, $q=2$, $c=2$, so that $d=1$:

$$\begin{array}{cc}\hfill {\mathit{\xi}}^{\prime}& =\left(\begin{array}{ccc}1& 1& 1\end{array}\right)\hfill \\ \hfill {\mathit{\eta}}^{\prime}& =\left(\begin{array}{cc}1& 2\end{array}\right)\hfill \\ \hfill {\mathit{\xi}}_{\perp}^{\prime}& =\left(\begin{array}{ccc}1& -1& \phantom{-}0\\ 0& \phantom{-}1& -1\end{array}\right)\hfill \end{array}$$

We have,
where $(1-L)\widehat{\mathbf{E}}\left(L\right)$ gathers the second and third terms in $\mathbf{M}\left(L\right)$. If the assumptions of Proposition 2 hold, we obtain an error correction representation with error terms

$$\begin{array}{cc}\hfill (1-L)\left(\begin{array}{c}{\mathit{\xi}}_{\perp}^{\prime}\\ {\mathit{\xi}}^{\prime}\end{array}\right){\mathbf{y}}_{t}& =\left(\begin{array}{ccc}1-L& 0& 0\\ 0& 1-L& 0\\ 0& 0& 1\end{array}\right)\left\{\left(\begin{array}{cc}{b}_{11}^{*}-{b}_{21}^{*}& {b}_{12}^{*}-{b}_{22}^{*}\\ {b}_{21}^{*}-{b}_{31}^{*}& {b}_{22}^{*}-{b}_{32}^{*}\\ 3& 6\end{array}\right)+(1-L)\phantom{\left(\begin{array}{c}1\\ 1\\ 1\end{array}\right)}\widehat{\mathbf{E}}\left(L\right)\right\}{\mathbf{u}}_{t},\hfill \end{array}$$

$${\mathit{\xi}}_{\perp}^{\prime}{\mathbf{y}}_{t}=\left(\begin{array}{c}{y}_{1t}-{y}_{2t}\\ {y}_{2t}-{y}_{3t}\end{array}\right).$$

However, we also have

$$\begin{array}{c}(1-L)\left(\begin{array}{c}{\mathit{\xi}}_{\perp}^{\prime}\\ {\mathit{\xi}}^{\prime}\end{array}\right){\mathbf{y}}_{t}=\left(\begin{array}{ccc}1-L& 0& 0\\ 0& 1& 0\\ 0& 0& 1\end{array}\right)\hfill \\ \times \left\{\left(\begin{array}{cc}{b}_{11}^{*}-{b}_{21}^{*}& {b}_{12}^{*}-{b}_{22}^{*}\\ (1-L)({b}_{21}^{*}-{b}_{31}^{*})& (1-L)({b}_{22}^{*}-{b}_{32}^{*})\\ 3& 6\end{array}\right)+(1-L)\stackrel{\u02c7}{\mathbf{E}}\left(L\right)\right\}{\mathbf{u}}_{t}=\left(\begin{array}{ccc}1-L& 0& 0\\ 0& 1& 0\\ 0& 0& 1\end{array}\right)\stackrel{\u02c7}{\mathbf{M}}\left(L\right){\mathbf{u}}_{t}.\hfill \end{array}$$

Under suitable assumptions on the coefficients ${b}_{ij}^{*}$ and $\stackrel{\u02c7}{\mathbf{E}}\left(L\right)$, assuming in particular that the matrix
is nonsingular, the matrix $\stackrel{\u02c7}{\mathbf{M}}\left(L\right)$ is zeroless and has therefore a finite-degree left inverse. Proceeding as in Proposition 2, we obtain an alternative error correction representation with just one error term, namely ${y}_{1t}-{y}_{2t}$.

$$\left(\begin{array}{cc}{b}_{11}^{*}-{b}_{21}^{*}& {b}_{12}^{*}-{b}_{22}^{*}\\ 3& 6\end{array}\right)$$

This example should be sufficient to convey the idea that ${\mathbf{y}}_{t}$ admits error correction representations with a minimum d and a maximum $c=r-q+d$ of error terms.

The problem of error correction representations, with different numbers of error terms, has been recently addressed in Deistler and Wagner (2017). An implication of their main result (see Theorem 1, p. 41) is that if ${\mathbf{y}}_{t}$ has the error correction representation
and $\mathrm{rank}\left(\tilde{\mathbf{A}}\left(1\right)\right)<c$ (the number of error terms is not the maximum), then $\tilde{\mathbf{A}}\left(L\right)$ and $\tilde{\mathbf{B}}$ are not left coprime.

$$\tilde{\mathbf{A}}\left(L\right){\mathbf{y}}_{t}={\tilde{\mathbf{A}}}^{*}\left(L\right)(1-L){\mathbf{y}}_{t}+\tilde{\mathbf{A}}\left(1\right){\mathbf{y}}_{t-1}=\tilde{\mathbf{B}}{\tilde{\mathbf{u}}}_{t},$$

The consequences of Deistler and Wagner’s paper have not yet been developed. In Propositions 2 and 3 we have only considered representations with c error terms. On non-uniqueness of autoregressive representations for singular vectors with rational spectral density see also Chen et al. (2011); Anderson et al. (2012); Forni et al. (2015).

#### Appendix B.2. Uniqueness of Impulse-Response Functions

Suppose that the assumptions of Proposition 2, weak form, hold. Let ${\mathbf{y}}_{t}$ be a solution of Equation (10), so that
and suppose that ${\mathbf{y}}_{t}$ has the autoregressive representation
where $\tilde{\mathbf{A}}\left(L\right)$ is a rational matrix with poles outside the unit circle, $\tilde{\mathbf{A}}\left(0\right)={\mathbf{I}}_{r}$, ${\tilde{\mathbf{u}}}_{t}$ is a nonsingular q-dimensional white noise, $\tilde{\mathbf{B}}$ is a full rank $r\times q$ matrix5. We have

$$(1-L){\mathbf{y}}_{t}=\mathbf{S}{\left(L\right)}^{-1}\mathbf{B}\left(L\right){\mathbf{u}}_{t},$$

$$\tilde{\mathbf{A}}\left(L\right){\mathbf{y}}_{t}=\tilde{\mathbf{B}}{\tilde{\mathbf{u}}}_{t},$$

$$\tilde{\mathbf{A}}\left(L\right)\left[(1-L){\mathbf{y}}_{t}\right]=(1-L)\tilde{\mathbf{B}}{\tilde{\mathbf{u}}}_{t}.$$

The assumption that $\tilde{\mathbf{B}}$ is full rank and the argument used e.g., in Brockwell and Davis (1991), p. 111, Problem 3.8, imply that ${\tilde{\mathbf{u}}}_{t}$ is fundamental for $(1-L){\mathbf{y}}_{t}$. Thus ${\tilde{\mathbf{u}}}_{t}=\mathbf{Q}{\mathbf{u}}_{t}$, where $\mathbf{Q}$ is a nonsingular $q\times q$ matrix (see Rozanov (1967), p. 57), and $\tilde{\mathbf{B}}{\tilde{\mathbf{u}}}_{t}=\left[\tilde{\mathbf{B}}\mathbf{Q}\right]{\mathbf{u}}_{t}$.

On the other hand, from (A2) and (A4):

$$\tilde{\mathbf{A}}\left(L\right)\mathbf{S}{\left(L\right)}^{-1}\mathbf{B}\left(L\right){\mathbf{u}}_{t}=(1-L)\left[\tilde{\mathbf{B}}\mathbf{Q}\right]{\mathbf{u}}_{t}.$$

As ${\mathbf{u}}_{t}$ is nonsingular, $\tilde{\mathbf{A}}\left(L\right)\mathbf{S}{\left(L\right)}^{-1}\mathbf{B}\left(L\right)=(1-L)\left[\tilde{\mathbf{B}}\mathbf{Q}\right].$ Setting $L=0$ we have $\tilde{\mathbf{B}}\mathbf{Q}=\mathbf{B}\left(0\right)$, so that (A3) becomes
while (A5) becomes

$$\tilde{\mathbf{A}}\left(L\right){\mathbf{y}}_{t}=\mathbf{B}\left(0\right){\mathbf{u}}_{t}$$

$$\tilde{\mathbf{A}}\left(L\right)\mathbf{S}{\left(L\right)}^{-1}\mathbf{B}\left(L\right){\mathbf{u}}_{t}=(1-L)\mathbf{B}\left(0\right){\mathbf{u}}_{t}.$$

The impulse-response function of ${\mathbf{y}}_{t}$ to ${\mathbf{u}}_{t}$ resulting from (A6) is $\mathbf{H}\left(L\right)\mathbf{B}\left(0\right)$, where $\mathbf{H}\left(L\right)\tilde{\mathbf{A}}\left(L\right)={\mathbf{I}}_{r}$. Multiplying both sides of (A7) by $\mathbf{H}\left(L\right)$ we obtain
so that $\mathbf{H}\left(L\right)\mathbf{B}\left(0\right)$ is obtained by cumulating $\mathbf{S}{\left(L\right)}^{-1}\mathbf{B}\left(L\right)$ and is therefore independent of $\tilde{\mathbf{A}}\left(L\right)$.

$$\mathbf{S}{\left(L\right)}^{-1}\mathbf{B}\left(L\right)=(1-L)\mathbf{H}\left(L\right)\mathbf{B}\left(0\right),$$

## Appendix C. Data Generating Process for the Simulations

The simulation results of Section 3.4 are obtained using the following specification of (14):
where $r=4$, $q=3$, $c=3$, the degree of $\mathbf{A}\left(L\right)$ is 2, so that the degree of ${\mathbf{A}}^{*}\left(L\right)$ is 1. $\mathbf{A}\left(L\right)$ is generated using the factorization
where $\mathcal{U}\left(L\right)$ and $\mathcal{V}\left(L\right)$ are $r\times r$ matrix polynomials with all their roots outside the unit circle, and

$$\mathbf{A}\left(L\right){\mathbf{y}}_{t}={\mathbf{A}}^{*}\left(L\right)(1-L){\mathbf{y}}_{t}+\mathit{\alpha}{\mathit{\beta}}^{\prime}{\mathbf{y}}_{t-1}=\mathbf{C}\left(0\right){\mathbf{u}}_{t}=\mathbf{G}\mathbf{H}{\mathbf{u}}_{t},$$

$$\mathbf{A}\left(L\right)=\mathcal{U}\left(L\right)\mathcal{M}\left(L\right)\mathcal{V}\left(L\right),$$

$$\mathcal{M}\left(L\right)=\left(\begin{array}{cc}(1-L){\mathbf{I}}_{r-c}& \mathbf{0}\\ \mathbf{0}& {\mathbf{I}}_{c}\end{array}\right)$$

(see Watson 1994). To get a VAR(2) we set $\mathcal{U}\left(L\right)={\mathbf{I}}_{r}-{\mathcal{U}}_{1}L$, and $\mathcal{V}\left(L\right)={\mathbf{I}}_{r}$, and then, by rewriting $\mathcal{M}\left(L\right)={\mathbf{I}}_{r}-{\mathcal{M}}_{1}L$, we get ${\mathbf{A}}_{1}={\mathcal{M}}_{1}+{\mathcal{U}}_{1}$, and ${\mathbf{A}}_{2}=-{\mathcal{M}}_{1}{\mathcal{U}}_{1}$.

Regarding the generation of the data, the diagonal entries of the matrix ${\mathcal{U}}_{1}$ are drawn from a uniform distribution between $0.5$ and $0.8$, while the extra–diagonal entries are drawn from a uniform distribution between 0 and $0.3$. ${\mathcal{U}}_{1}$ is then multiplied by a scalar so that its largest eigenvalue is $0.6$. The matrix $\mathbf{G}$ is generated as in Bai and Ng (2007): (1) $\tilde{\mathbf{G}}$ is an $r\times r$ diagonal matrix of rank q where ${\tilde{g}}_{ii}$ is drawn from the uniform distribution between $0.8$ and $1.2$, (2) $\stackrel{\u02c7}{\mathbf{G}}$ is obtained by orthogonalizing an $r\times r$ uniform random matrix, (3) $\mathbf{G}$ is equal to the first q columns of the matrix $\stackrel{\u02c7}{\mathbf{G}}{\tilde{\mathbf{G}}}^{1/2}$. Lastly, the orthogonal matrix $\mathbf{H}$ is such that the upper $3\times 3$ submatrix of $\mathbf{G}\mathbf{H}$ is lower triangular. The results are based on 1000 replications. The matrices ${\mathcal{U}}_{1}$, $\mathbf{G}$ and $\mathbf{H}$ are generated only once (the numerical values are available on request) so that the set of impulse responses to be estimated is the same for all replications, whereas the vector ${\mathbf{u}}_{t}$ is redrawn from $\mathcal{N}(\mathbf{0},{\mathbf{I}}_{4})$ at each replication.

## References

- Amengual, Dante, and Mark W. Watson. 2007. Consistent estimation of the number of dynamic factors in a large N and T panel. Journal of Business and Economic Statistics 25: 91–96. [Google Scholar] [CrossRef]
- Anderson, Brian DO, and Manfred Deistler. 2008a. Generalized linear dynamic factor models–A structure theory. Paper presented at IEEE Conference on Decision and Control, Cancun, Mexico, December 9–11. [Google Scholar]
- Anderson, Brian DO, and Manfred Deistler. 2008b. Properties of zero-free transfer function matrices. SICE Journal of Control, Measurement and System Integration 1: 284–92. [Google Scholar] [CrossRef]
- Anderson, Brian DO, Manfred Deistler, Weitian Chen, and Alexander Filler. 2012. Autoregressive models of singular spectral matrices. Automatica 48: 2843–49. [Google Scholar] [CrossRef]
- Bai, Jushan, and Serena Ng. 2007. Determining the number of primitive shocks in factor models. Journal of Business and Economic Statistics 25: 52–60. [Google Scholar] [CrossRef]
- Banerjee, Anindya, Massimiliano Marcellino, and Igor Masten. 2014. Forecasting with factor-augmented error correction models. International Journal of Forecasting 30: 589–612. [Google Scholar] [CrossRef]
- Banerjee, Anindya, Massimiliano Marcellino, and Igor Masten. 2017. Structural FECM: Cointegration in large–scale structural FAVAR models. Journal of Applied Econometrics 32: 1069–86. [Google Scholar] [CrossRef]
- Barigozzi, Matteo, Antonio M. Conti, and Matteo Luciani. 2014. Do euro area countries respond asymmetrically to the common monetary policy? Oxford Bulletin of Economics and Statistics 76: 693–714. [Google Scholar] [CrossRef]
- Barigozzi, Matteo, Marco Lippi, and Matteo Luciani. 2019. Large-dimensional dynamic factor models: Estimation of impulse-response functions with I(1) cointegrated factors. arXiv. [Google Scholar]
- Bauer, Dietmar, and Martin Wagner. 2012. A State Space Canonical Form For Unit Root Processes. Econometric Theory 28: 1313–49. [Google Scholar] [CrossRef]
- Brockwell, Peter J., and Richard A. Davis. 1991. Time Series: Theory and Methods, 2nd ed. New York: Springer. [Google Scholar]
- Canova, Fabio. 2007. Methods for Applied Macroeconomics. Princeton: Princeton University Press. [Google Scholar]
- Chen, Weitian, Brian DO Anderson, Manfred Deistler, and Alexander Filler. 2011. Solutions of Yule-Walker equations for singular AR processes. Journal of Time Series Analysis 32: 531–38. [Google Scholar] [CrossRef]
- Deistler, Manfred, Brian DO Anderson, A. Filler, Ch Zinner, and W. Chen. 2010. Generalized linear dynamic factor models: An approach via singular autoregressions. European Journal of Control 16: 211–24. [Google Scholar] [CrossRef]
- Deistler, Manfred, and Martin Wagner. 2017. Cointegration in singular ARMA models. Economics Letters 155: 39–42. [Google Scholar] [CrossRef]
- Forni, Mario, and Luca Gambetti. 2010. The dynamic effects of monetary policy: A structural factor model approach. Journal of Monetary Economics 57: 203–16. [Google Scholar] [CrossRef]
- Forni, Mario, Domenico Giannone, Marco Lippi, and Lucrezia Reichlin. 2009. Opening the Black Box: Structural Factor Models versus Structural VARs. Econometric Theory 25: 1319–47. [Google Scholar] [CrossRef]
- Forni, Mario, Marc Hallin, Marco Lippi, and Lucrezia Reichlin. 2000. The Generalized Dynamic Factor Model: Identification and Estimation. The Review of Economics and Statistics 82: 540–54. [Google Scholar] [CrossRef]
- Forni, Mario, Marc Hallin, Marco Lippi, and Paolo Zaffaroni. 2015. Dynamic factor models with infinite-dimensional factor spaces: One-sided representations. Journal of Econometrics 185: 359–71. [Google Scholar] [CrossRef]
- Forni, Mario, and Marco Lippi. 2001. The Generalized Dynamic Factor Model: Representation Theory. Econometric Theory 17: 1113–41. [Google Scholar] [CrossRef]
- Franchi, Massimo, and Paolo Paruolo. 2019. A general inversion theorem for cointegration. Econometric Reviews 38: 1176–201. [Google Scholar] [CrossRef]
- Franklin, J. N. 2000. Matrix Theory, 2nd ed. New York: Dover Publications. [Google Scholar]
- Giannone, Domenico, Lucrezia Reichlin, and Luca Sala. 2005. Monetary policy in real time. In NBER Macroeconomics Annual 2004. Edited by Mark Gertler and Kenneth Rogoff. Cambridge: MIT Press, chp. 3. pp. 161–224. [Google Scholar]
- Gregoir, Stéphane. 1999. Multivariate Time Series With Various Hidden Unit Roots, Part I. Econometric Theory 15: 435–68. [Google Scholar] [CrossRef]
- Johansen, Søren. 1988. Statistical analysis of cointegration vectors. Journal of Economic Dynamics and Control 12: 231–54. [Google Scholar] [CrossRef]
- Johansen, Søren. 1991. Estimation and hypothesis testing of cointegration vectors in Gaussian vector autoregressive models. Econometrica 59: 1551–80. [Google Scholar] [CrossRef]
- Johansen, Søren. 1995. Likelihood-Based Inference in Cointegrated Vector Autoregressive Models, 1st ed. Oxford: Oxford University Press. [Google Scholar]
- Lancaster, Peter, and Miron Tismenetsky. 1985. The Theory of Matrices, 2nd ed. New York: Academic Press. [Google Scholar]
- Luciani, Matteo. 2015. Monetary policy and the housing market: A structural factor analysis. Journal of Applied Econometrics 30: 199–218. [Google Scholar] [CrossRef]
- Phillips, Peter C.B. 1998. Impulse response and forecast error variance asymptotics in nonstationary VARs. Journal of Econometrics 83: 21–56. [Google Scholar] [CrossRef]
- Rozanov, Yu. A. 1967. Stationary Random Processes. San Francisco: Holden-Day. [Google Scholar]
- Sargent, Thomas J. 1989. Two Models of Measurements and the Investment Accelerator. Journal of Political Economy 97: 251–87. [Google Scholar] [CrossRef]
- Sims, Christopher, James H. Stock, and Mark W. Watson. 1990. Inference in linear time series models with some unit roots. Econometrica 58: 113–44. [Google Scholar] [CrossRef]
- Stock, James H., and Mark W. Watson. 1988. Testing for common trends. Journal of the American Statistical Association 83: 1097–107. [Google Scholar] [CrossRef]
- Stock, James H., and Mark W. Watson. 2002a. Forecasting using principal components from a large number of predictors. Journal of the American Statistical Association 97: 1167–79. [Google Scholar] [CrossRef]
- Stock, James H., and Mark W. Watson. 2002b. Macroeconomic forecasting using diffusion indexes. Journal of Business and Economic Statistics 20: 147–62. [Google Scholar] [CrossRef]
- Stock, James H., and Mark W. Watson. 2016. Dynamic factor models, factor-augmented vector autoregressions, and structural vector autoregressions in macroeconomics. In Handbook of Macroeconomics. Edited by John B. Taylor and Harald Uhlig. Amsterdam: North Holland, Elsevier, vol. 2A, chp. 8. pp. 415–525. [Google Scholar]
- Van der Waerden, Bartel Leendert. 1953. Modern Algebra, 2nd ed. New York: Frederick Ungar, vol. I. [Google Scholar]
- Watson, Mark W. 1994. Vector autoregressions and cointegration. In Handbook of Econometrics. Edited by Robert F. Engle and Daniel L. McFadden. Amsterdam: North Holland, Elsevier, vol. 4, chp. 47. pp. 2843–915. [Google Scholar]

1 | Usually orthonormality is assumed. This is convenient but not necessary in the present paper. |

2 | To our knowledge, the present paper is the first to study cointegration and error correction representations for $I\left(1\right)$ singular vectors, the factors of $I\left(1\right)$ dynamic factor models in particular. An error correction model in the DFM framework is studied in (Banerjee et al. 2014, 2017). However, their focus is on the relationship between the observable variables and the factors. Their error correction term is a linear combination of the variables ${x}_{it}$ and the factors ${\mathbf{F}}_{t}$, which is stationary if the idiosyncratic components are stationary (so that the x’s and the factors are cointegrated). Because of this and other differences their results are not directly comparable to those in the present paper. |

3 | In the square case, r = q, Assumption 3 holds if and only if M(z) is unimodular. |

4 | If ${z}^{*}$ is a zero of $\mathbf{M}\left(z\right)$, multiply $\mathbf{M}\left(z\right)$ by an invertible $r\times r$ matrix ${\mathbf{Q}}_{{z}^{*}}$ such that ${z}^{*}$ is a zero of, say, the first row of ${\mathbf{Q}}_{{z}^{*}}\mathbf{M}\left(z\right)$. Then multiply by the $r\times r$ diagonal matrix with ${(z-{z}^{*})}^{-1}$ in position $(1,1)$ and unity elsewhere on the main diagonal. Iterating, all the zeros of $\mathbf{M}\left(z\right)$ are removed. |

5 | Multiplying both sides of (A3) by $(1-L)$ and using (A2), we obtain $\tilde{\mathbf{A}}\left(L\right)\mathbf{S}{\left(L\right)}^{-1}\mathbf{B}\left(L\right){\mathbf{u}}_{t}=(1-L)\tilde{\mathbf{B}}{\tilde{\mathbf{u}}}_{t}$. Comparing the spectral densities of right- and left-hand terms, it is easy to prove that ${\tilde{\mathbf{u}}}_{t}$ must be a q-dimensional, nonsingular white noise and the rank of $\tilde{\mathbf{B}}$ must be q. |

Lags | DVAR | LVAR | VECM | Lags | DVAR | LVAR | VECM | ||
---|---|---|---|---|---|---|---|---|---|

$T=100$ | 0 | 0.06 | 0.05 | 0.05 | $T=500$ | 0 | 0.02 | 0.02 | 0.02 |

4 | 0.26 | 0.18 | 0.17 | 4 | 0.23 | 0.07 | 0.07 | ||

20 | 0.30 | 0.37 | 0.22 | 20 | 0.25 | 0.14 | 0.09 | ||

40 | 0.30 | 0.45 | 0.22 | 40 | 0.25 | 0.21 | 0.09 | ||

80 | 0.30 | 0.57 | 0.22 | 80 | 0.25 | 0.32 | 0.09 | ||

$T=1000$ | 0 | 0.02 | 0.02 | 0.02 | $T=5000$ | 0 | 0.01 | 0.01 | 0.01 |

4 | 0.23 | 0.05 | 0.05 | 4 | 0.22 | 0.02 | 0.02 | ||

20 | 0.25 | 0.09 | 0.07 | 20 | 0.25 | 0.03 | 0.03 | ||

40 | 0.25 | 0.13 | 0.07 | 40 | 0.25 | 0.04 | 0.03 | ||

80 | 0.25 | 0.22 | 0.07 | 80 | 0.25 | 0.06 | 0.03 |

Root mean squared errors at different lags, when estimating the impulse-response functions of the simulated variables ${\mathbf{y}}_{t}$ to the shocks ${\mathbf{u}}_{t}$. Estimation is carried out using three different autoregressive representations: a VAR for $(1-L){\mathbf{y}}_{t}$ (DVAR), a VAR for ${\mathbf{y}}_{t}$ (LVAR), and a VECM with $c=r-q+d$ error terms (VECM). The results are based on 1000 replications. For the data generating process see Appendix C. The RMSEs are obtained averaging over all replications and all $4\times 3$ responses.

© 2020 by Matteo Barigozzi and Marco Lippi. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).