A Parameterization of Models for Unit Root Processes: Structure Theory and Hypothesis Testing

We develop and discuss a parameterization of vector autoregressive moving average processes with arbitrary unit roots and (co)integration orders. The detailed analysis of the topological properties of the parameterization—based on the state space canonical form of Bauer and Wagner (2012)—is an essential input for establishing statistical and numerical properties of pseudo maximum likelihood estimators as well as, e.g., pseudo likelihood ratio tests based on them. The general results are exemplified in detail for the empirically most relevant cases, the (multiple frequency or seasonal) I(1) and the I(2) case. For these two cases we also discuss the modeling of deterministic components in detail.


Introduction
Since the seminal contribution of Clive W.J. Granger (1981) that introduced the concept of cointegration, the modeling of multivariate (economic) time series with models and methods that allow for unit roots and cointegration has become standard econometric practice with applications ranging from macroeconomics to finance to climate science.
The most prominent (parametric) model class for cointegration analysis are vector autoregressive (VAR) models, popularized by the important contributions of Søren Johansen and Katarina Juselius and their co-authors, see, e.g., the monographs Johansen (1995) and Juselius (2006). The popularity of VAR cointegration analysis stems not only from the (relative) simplicity of the model class, but also from the fact that the VAR cointegration literature is very well-developed and provides a large battery of tools for diagnostic testing, impulse response analysis, forecast error variance decompositions and the like. All this makes VAR cointegration analysis to a certain extent the benchmark in the literature. 1 The imposition of specific cointegration properties on an estimated VAR model becomes increasingly complicated as one moves away from the I(1) case. As discussed in Section 2, e.g., in the 1 Please note that the original contribution to the estimation of cointegrating relationship has been least squares estimation in a non-or semi-parametric regression setting, see, e.g., Engle and Granger (1987). A recent survey of regression-based cointegration analysis is provided by Wagner (2018). I(2) case a triple of indices needs to be chosen (fixed or determined via testing) to describe the cointegration properties. The imposition of cointegration properties in the estimation algorithm then leads to "switching" type algorithms that come together with non-trivial parameterization restrictions involving non-linear inter-relations, compare Paruolo (1996) or Paruolo (2000). 2 Mathematically, these complications arise from the fact that the unit root and cointegration properties are in the VAR setting related to rank restrictions on the autoregressive polynomial matrix and its derivatives.
Restricting cointegration analysis to VAR processes may be too restrictive. First, it is well-known since Zellner and Palm (1974) that VAR processes are not invariant with respect to marginalization, i.e., subsets of the variables of a VAR process are in general vector autoregressive moving average (VARMA) processes. Second, similar to the first argument, aggregation of VAR processes also leads to VARMA processes, an issue relevant, e.g., in the context of temporal aggregation and in mixed-frequency settings. Third, the linearized solutions to dynamic stochastic general equilibrium (DSGE) models are typically VARMA rather than VAR processes, see, e.g., Campbell (1994). Fourth, a VARMA model may be a more parsimonious description of the data generating process (DGP) than a VAR model, with parsimony becoming more important with increasing dimension of the process. 3 If one accepts the above arguments as a motivation for considering VARMA processes in cointegration analysis, it is convenient to move to the-essentially equivalent (see Hannan and Deistler 1988, chps. 1 and 2)-state space framework. A key challenge when moving from VAR to VARMA models-or state space models-is that identification becomes an important issue for the latter model class, whereas unrestricted VAR models are (reduced-form) identified. In other words, there are so-called equivalence classes of VARMA models that lead to the same dynamic behavior of the observed process. As is well-known, to achieve identification, restrictions have to be placed on the coefficient matrices in the VARMA case, e.g., zero or exclusion restrictions. A mapping attaching to every transfer function, i.e, the function relating the error sequence to the observed process, a unique VARMA (or state space) system from the corresponding class of observationally equivalent systems is called canonical form. Since not all entries of the coefficient matrices in canonical form are free parameters, for statistical analysis a so-called parameterization is required that maps the free parameters from coefficient matrices in canonical form into a parameter vector. These issues, including the importance of the properties such as continuity and differentiability of parameterizations, are discussed in detail in Hannan and Deistler (1988, chp. 2) and, of course, are also relevant for our setting in this paper.
The convenience of the state space framework for unit root and cointegration analysis stems from the fact that (static and dynamic) cointegration can be characterized by orthogonality constraints, see Bauer and Wagner (2012), once an appropriate basis for the state vector, which is a (potentially singular) VAR process of order one, is chosen. The integration properties are governed by the eigenvalue structure of unit modulus eigenvalues of the system matrix in the state equation. Eigenvalues of unit modulus and orthogonality constraints arguably are easier restrictions to deal with or to implement than the interrelated rank restrictions considered in the VAR or VARMA setting. The canonical form of Bauer and Wagner (2012) is designed for cointegration analysis by using a basis of the state vector that puts the unit root and cointegration properties to the center and forefront. Consequently, these results are key input for the present paper and are thus briefly reviewed in Section 3. 2 The complexity of these inter-relations is probably well illustrated by the fact that only Jensen (2013) notes that "even though the I(2) models are formulated as submodels of I(1) models, some I(1) models are in fact submodels of I(2) models". 3 The literature often uses VAR models as approximations, based on the fact that VARMA processes often can be approximated by VAR models with the order tending to infinity with the sample size at certain rates. This line of work goes back to Lewis and Reinsel (1985) for stationary processes and was extended to (co)integrated processes by Saikkonen (1992), Saikkonen and Luukkonen (1997) and Bauer and Wagner (2005). In addition to the issue of the existence and properties of a sequence of VAR approximations, the question whether a VAR approximation is parsimonious remains.
An important problem with respect to appropriately defining the "free parameters" in VARMA models is the fact that no continuous parameterization of all VARMA or state space models of a certain order n exists in the multivariate case (see Hazewinkel and Kalman 1976). This implies that the model set, M n say, has to be partitioned into subsets on which continuous parameterizations exist, i.e., M n = Γ∈G M Γ for some multi-index Γ varying in an index set G. Based on the canonical form of Bauer and Wagner (2012), the partitioning is according to systems-in addition to other restrictions such as fixed order n-with fixed unit root properties, to be precise over systems with given state space unit root structure. This has the advantage that, e.g., pseudo maximum likelihood (PML) estimation can straightforwardly be performed over systems with fixed unit root properties without any further ado, i.e., without having to consider (or ignore) rank restrictions on polynomial matrices. The definition and detailed discussion of the properties of this parameterization is the first main result of the paper.
The second main set of results, provided in Section 4, is a detailed discussion of the relationships between the different subsets of models M Γ for different indices Γ and the parameterization of the respective model sets. Knowledge concerning these relations is important to understand the asymptotic behavior of PML estimators and pseudo likelihood ratio tests based on them. In particular, the structure of the closures of M, M say, of the considered model set M has to be understood, since the difference M \ M cannot be avoided when maximizing the pseudo likelihood function 4 . Additionally, the inclusion properties between different sets M Γ need to be understood, as this knowledge is important for developing hypothesis tests, in particular for developing hypothesis tests for the dimensions of cointegrating spaces. Hypotheses testing, with a focus on the MFI(1) and I(2) cases, is discussed in Section 5, which shows how the parameterization results of the paper can be used to formulate a large number of hypotheses on (static and polynomial) cointegrating relationships as considered in the VAR cointegration literature. This discussion also includes commonly used deterministic components such as intercept, seasonal dummies, and linear trend, as well as restrictions on these components.
The paper is organized as follows: Section 2 briefly reviews VAR and VARMA models with unit roots and cointegration and discusses some of the complications arising in the VARMA case in addition to the complications arising due to the presence of unit roots and cointegration already in the VAR case. Section 3 presents the canonical form and the parameterization based on it, with the discussion starting with the multiple frequency I(1)-MFI(1)-and I(2) cases prior to a discussion of the general case. This section also provides several important definitions like, e.g., of the state space unit root structure. Section 4 contains a detailed discussion concerning the topological structure of the model sets and Section 5 discusses testing of a large number of hypotheses on the cointegrating spaces commonly tested in the cointegration literature. The discussion in Section 5 focuses on the empirically most relevant MFI(1) and I(2) cases and includes the usual deterministic components considered in the literature. Section 6 briefly summarizes and concludes the paper. All proofs are relegated to the Appendices A and B.
Throughout we use the following notation: L denotes the lag operator, i.e., L({x t } t∈Z ) := {x t−1 } t∈Z , for brevity written as Lx t = x t−1 . For a matrix γ ∈ C s×r , γ ∈ C r×s denotes its conjugate transpose. For γ ∈ C s×r with full column rank r < s, we define γ ⊥ ∈ C s×(s−r) of full column rank such that γ γ ⊥ = 0. I p denotes the p-dimensional identity matrix, 0 m×n the m times n zero matrix. For two matrices A ∈ C m×n , B ∈ C k×l , A ⊗ B ∈ C mk×nl denotes the Kronecker product of A and B. For a complex valued quantity x, R(x) denotes its real part, I(x) its imaginary part and x its complex conjugate. For a set V, V denotes its closure. 5 For two sets V and W, V \ W denotes the difference of V and W, i.e., {v ∈ V : v / ∈ W}. For a square matrix A we denote the spectral radius (i.e., the maximum of the moduli of its eigenvalues) by λ |max| (A) and by det(A) its determinant. 4 Below we often use the term "likelihood" as short form of "likelihood function". 5 We are confident that this dual usage of notation does not lead to confusion.

Vector Autoregressive, Vector Autoregressive Moving Average Processes and Parameterizations
In this paper, we define VAR processes {y t } t∈Z , y t ∈ R s , as solution of a(L)y t = y t + p ∑ j=1 a j y t−j = ε t + Φd t , ( with a(L) := I s + ∑ p j=1 a j L j , where a j ∈ R s×s for j = 1, . . . , p, Φ ∈ R s×m , a p = 0, a white noise process {ε t } t∈Z , ε t ∈ R s , with Σ := E(ε t ε t ) > 0 and a vector sequence {d t } t∈Z , d t ∈ R m , comprising deterministic components like, e.g., the intercept, seasonal dummies or a linear trend. Furthermore, we impose the non-explosiveness condition det a(z) = 0 for all |z| < 1, with a(z) := I s + ∑ p j=1 a j z j and z denoting a complex variable. 6 Thus, for given autoregressive order p, with-as defining characteristic of the order-a p = 0, the considered class of VAR models with specified deterministic components {d t } t∈Z is given by the set of all polynomial matrices a(z) such that (i) the non-explosiveness condition holds, (ii) a(0) = I s and (iii) a p = 0; together with the set of all matrices Φ ∈ R s×m . Equivalently, the model class can be characterized by a set of rational matrix functions k(z) := a(z) −1 , referred to as transfer functions, and the input-output description for the deterministic variables, i.e., V p,Φ := V p × R s×m , a j z j , det a(z) = 0 for |z| < 1, a p = 0 .
Remark 1. In the above discussion the parameters, θ Σ say, describing the variance covariance matrix Σ of ε t are not considered. These can be easily included, similarly to Φ by, e.g., parameterizing positive definite symmetric s × s matrices via their lower triangular Cholesky factor. This leads to a parameter space Θ p,Φ,Σ ⊂ R s 2 p+sm+ s(s+1) 2 . We omit θ Σ for brevity, since typically no cross-parameter restrictions involving parameters corresponding to Σ are considered, whereas as discussed in Section 5 parameter restrictions involving-in this paper in the state space rather than the VAR setting-both elements of Θ p and Φ, to, e.g., impose the absence of a linear trend in the cointegrating space, are commonly considered in the cointegration literature. 7 The estimator of the variance covariance matrix Σ often equals the sample variance of suitable residualsε t (θ) from (1), if there are no cross-restrictions between θ and θ Σ . This holds, e.g., for the Gaussian pseudo maximum likelihood estimator. Thus, explicitly including θ Σ and Θ Σ in the discussion would only overload notation without adding any additional insights, given the simple nature of the parameterization of Σ. 6 Our definition of VAR processes differs to a certain extent from some widely used definitions in the literature. Given our focus on unit root and cointegration analysis we, unlike Hannan and Deistler (1988), allow for determinantal roots at the unit circle that, as is well known, lead to integrated processes. We also include deterministic components in our definition, i.e., we allow for a special case of exogenous variables, compare also Remark 2 below. There is, however, also a large part of the literature that refers to this setting simply as (cointegrated) vector autoregressive models, see, e.g., Johansen (1995) and Juselius (2006). 7 Of course, the statistical properties of the parameter estimators depend in many ways on the deterministic components.
Remark 2. Our consideration of deterministic components is a special case of including exogenous variables. We include exogenous deterministic variables with a static input-output behavior governed solely by the matrix Φ. More general exogenous variables that are dynamically related to the output {y t } t∈Z could be considered, thereby considering so-called VARX models rather than VAR models, which would necessitate considering in addition to the transfer function k(z) also a transfer function l(z), say, linking the exogenous variables dynamically to the output.
For the VAR case, the fact that the mapping assigning a given transfer function k(z) ∈ V p , to a parameter vector θ a ∈ Θ p -the parameterization-is continuous with continuously differentiable inverse is immediate. 8 Homeomorphicity of a parameterization is important for the properties of parameter estimators, e.g., the ordinary least squares (OLS) or Gaussian PML estimator, compare the discussion in Hannan and Deistler (1988, Theorem 2.5.3 and Remark 1, p. 65).
For OLS estimation one typically considers the larger set V OLS p without the non-explosiveness condition and without the assumption a p = 0: Considering V OLS p allows for unconstrained optimization. It is well-known that for {ε t } t∈Z as given above, the OLS estimator is consistent over the larger set V OLS p , i.e., without imposing non-explosiveness and also when specifying p too high. Alternatively, and closely related to OLS in the VAR case, the pseudo likelihood can be maximized over Θ p,Φ . With this approach, maxima respectively suprema can occur at the boundary of the parameter space, i.e., maximization effectively has to consider Θ p,Φ . It is well-known that the PML estimator is consistent for the stable case (cf. Hannan and Deistler 1988, Theorem 4.2.1), but the maximization problem is complicated by the restrictions on the parameter space stemming from the non-explosiveness condition. Avoiding these complications and asymptotic equivalence of OLS and PML in the stable VAR case explains why VAR models are usually estimated by OLS. 9 To be more explicit, ignore deterministic components for a moment and consider the case where the DGP is a stationary VAR process, i.e., a solution of (1) with a(z) satisfying the stability condition det a(z) = 0 for |z| ≤ 1. Define the corresponding set of stable transfer functions by V p,• : Clearly, V p,• is an open subset of V p . If the DGP is a stationary VAR process, the above-mentioned consistency result of the OLS estimator over V OLS p implies that the probability that the estimated transfer function,k(z) =â(z) −1 say, is contained in V p,• converges to one as the sample size tends to infinity. Moreover, the asymptotic distribution of the estimated parameters is normal, under appropriate assumptions on {ε t } t∈Z .
The situation is a bit more involved if the transfer function of the DGP corresponds to a point in the set V p,• \ V p,• , which contains systems with unit roots, i.e., determinantal roots of a(z) on the unit circle, as well as lower order autoregressive systems-with these two cases non-disjoint. The stable lower order case is relatively unproblematic from a statistical perspective. If, e.g., OLS estimation is performed over V OLS p , while the true model corresponds to an element in V p * ,• , with p * < p, the OLS estimator is 8 The set V p is endowed with the pointwise topology T pt , defined in Section 3. For now, in the context of VAR models, it suffices to know that convergence in pointwise topology is equivalent to convergence of the VAR coefficient matrices a 1 , . . . , a p in the Frobenius norm. 9 Please note that in case of restricted estimation, i.e., zero restrictions or cross-equation restrictions, OLS is not asymptotically equivalent to PML in general. still consistent, since V p * ,• ⊂ V OLS p . Furthermore, standard chi-squared pseudo likelihood ratio test based inference still applies. The integrated case, for a precise definition see the discussion below Definition 1, is a bit more difficult to deal with, as in this case not all parameters are asymptotically normally distributed and nuisance parameters may be present. Consequently, parameterizations that do not take the specific nature of unit root processes into account are not very useful for inference in the unit root case, see, e.g., Sims et al. (1990, Theorem 1). Studying the unit root and cointegration properties is facilitated by resorting to suitable parameterizations that "zoom in on the relevant characteristics".
In case that the only determinantal root of a(z) on the unit circle is at z = 1, the system corresponds to a so-called I(d) process, with the integration order d > 0 made precise in Definition 1 below. Consider first the I(1) case: As is well-known, the rank of the matrix a(1) equals the dimension of the cointegrating space given in Definition 3 below-also referred to as the cointegrating rank. Therefore, determination of the rank of this matrix is of key importance. With the parameterization used so far, imposing a certain (maximal) rank on a(1) implies complicated restrictions on the matrices a j , j = 1, . . . , p. This in turn renders the correspondingly restricted optimization unnecessarily complicated and not conducive to develop tests for the cointegrating rank. It is more convenient to consider the so-called vector error correction model (VECM) representation of autoregressive processes, discussed in full detail in the monograph Johansen (1995). To this end let us first introduce the differencing operator at frequency 0 ≤ ω ≤ π For notational brevity, we omit the dependence on L in ∆ ω (L), henceforth denoted as ∆ ω . Using this notation, the I(1) error correction representation is given by with the matrix Π := −a(1) = −(I s + ∑ p j=1 a j ) of rank 0 ≤ r ≤ s factorized into the product of two full rank matrices α, β ∈ R s×r and Γ j := ∑ p m=j+1 a m , j = 1, . . . , p − 1. This constitutes a reparameterization, where k(z) ∈ V p is now represented by the matrices (α, β, Γ 1 , . . . , Γ p−1 ) and a corresponding parameter vector θ VECM a ∈ Θ VECM p,r . Please note that stacking the entries of the matrices does not lead to a homeomorphic mapping from V p to Θ VECM p,s , since for 0 < r ≤ s the matrices α and β are not identifiable from the product αβ , since αβ = αMM −1 β =αβ for all regular matrices M ∈ R r×r . One way to obtain identifiability is to introduce the restriction β = [I r , β * ] , with β * ∈ R (s−r)×r and α ∈ R s×r . With this additional restriction the parameter vector θ VECM a is given by stacking the vectorized matrices α, β * , Γ 1 , . . . , Γ p−1 , similarly to (2). Then Θ VECM p,r,Φ = Θ VECM p,r × R sm ⊂ R ps 2 −(s−r) 2 +sm . Note for completeness that the normalization of β = [I r , β * ] may necessitate a re-ordering of the variables in {y t } t∈Z since-without potential reordering-this parameterization implies a restriction of generality as, e.g., processes, where the first variable is integrated, but does not cointegrate with the other variables, cannot be represented.
Define the following sets of transfer functions: V p,r := a(z) −1 ∈ V p : det a(z) = 0 for {z : |z| = 1, z = 1}, rank(a(1)) ≤ r , The dimension of the parameter vector θ VECM a depends on the dimension of the cointegrating space, thus the parameterization of k(z) ∈ V p,r depends on r. The so-called reduced rank regression (RRR) estimator, given by the maximizer of the pseudo likelihood over V RRR p,r is consistent, see, e.g., Johansen (1995, chp. 6). The RRR estimator uses an "implicit" normalization of β and thereby implicitly addresses the mentioned identification problem. However, for testing hypotheses involving the free parameters in α or β, typically the identifying assumption given above is used, as discussed in Johansen (1995, chp. 7).
Furthermore, since V p,r ⊂ V p,r * for r < r * ≤ s, with Θ VECM p,r a lower dimensional subset of Θ VECM p,r * , pseudo likelihood ratio testing can be used to sequentially test for the rank r, starting with the hypothesis of a rank r = 0 against the alternative of a rank 0 < r ≤ s, and increasing the assumed rank consecutively until the null hypothesis is not rejected.
Ensuring that {y t } t∈Z generated from (4) is indeed an I(1) process, requires on the one hand that Π is of reduced rank, i.e., r < s and on the other that the matrix has full rank. It is well-known that condition (5) is fulfilled on the complement of a "thin" algebraic subset of V RRR p,r , and is therefore, ignored in estimation, as it is "generically" fulfilled. 10 The I(2) case is similar in structure to the I(1) case, but with two rank restrictions and one full rank condition to exclude even higher integration orders. The corresponding VECM is given by with α, β as defined in (4), Γ as defined in (5) and Ψ j : (5) we already know that reduced rank of with ξ, η ∈ R (s−r)×m , m < s − r is required for higher integration orders. The condition for the corresponding solution process {y t } t∈Z to be an I(2) process is given by full rank of which again is typically ignored in estimation, just like condition (5) in the I(1) case. Thus, I(2) processes correspond to a "thin subset" of V RRR p,r , which in turn constitutes a "thin subset" of V OLS p . The fact that integrated processes correspond to "thin sets" in V OLS p implies that obtaining estimated systems with specific integration and cointegration properties requires restricted estimation based on parameterizations tailor made to highlight these properties.
Already for the I(2) case, formulating parameterizations that allow conveniently studying the integration and cointegration properties is a quite challenging task. Johansen (1997) contains several different (re-)parameterizations for the I(2) case and Paruolo (1996) defines "integration indices", r 0 , r 1 , r 2 say, as the number of columns of the matrices β ∈ R s×r 0 , β 1 := β ⊥ η ∈ R s×r 1 and β 2 := β ⊥ η ⊥ ∈ R s×r 2 . Clearly, the indices r 0 , r 1 , r 2 are linked to the ranks of the above matrices Π and α ⊥ Γβ ⊥ , as r 0 = r and r 1 = m and the columns of [β, β 1 , β 2 ] form a basis of R s , such that s = r 0 + r 1 + r 2 . 10 A similar property holds for V RRR p,r being a "thin" subset of V OLS p . This implies that the probability that the OLS estimator calculated over V OLS p corresponds to an element V RRR p,r ⊂ V OLS p is equal to zero in general.
It holds that {β 2 y t } t∈Z is an I(2) process without cointegration and {β 1 y t } t∈Z is an I(1) process without cointegration. The process {β y t } t∈Z is typically I(1) and in this case cointegrates with {β 2 ∆ 0 y t } t∈Z to stationarity. Thus, there is a direct correspondence of these indices to the dimensions of the different cointegrating spaces-both static and dynamic (with precise definitions given below in Definition 3). 11 Please note that again, as already before in the I(1) case, different values of the integration indices r 0 , r 1 , r 2 , lead to parameter spaces of different dimensions. Furthermore, in these parameterizations matrices describing different cointegrating spaces are (i) not identified and (ii) linked by restrictions, compare the discussion in Paruolo (2000, sct. 2.2) and (7). These facts render the analysis of the cointegration properties in I(2) VAR systems complicated. Also, in the I(2) VAR case usually some forms of RRR estimators are considered over suitable subsets V RRR p,r,m of V RRR p,r , again based on implicit normalizations. Inference, however, again requires one to consider parameterizations explicitly.
Estimation and inference issues are fundamentally more complex in the VARMA case than in the VAR case. This stems from the fact that unrestricted estimation-unlike in the VAR case-is not possible due to a lack of identification, as discussed below. This means that in the VARMA case identification and parameterization issues need to be tackled as the first step, compare the discussion in Hannan and Deistler (1988, chp. 2).
In this paper, we consider VARMA processes as solutions of the vector difference equation with a(L) := I s + ∑ p j=1 a j L j , where a j ∈ R s×s for j = 1, . . . , p, a p = 0 and the non-explosiveness condition det(a(z)) = 0 for |z| < 1. Similarly, b(L) := I s + ∑ q j=1 b j L j , where b j ∈ R s×s for j = 1, . . . , q, b q = 0 and Φ ∈ R s×m . The transfer function corresponding to a VARMA process is k(z) := a(z) −1 b(z).
It is well-known that without further restrictions the VARMA realization (a(z), b(z)) of the transfer function k(z) = a(z) −1 b(z) is not identified, i.e., different pairs of polynomial matrices (a(z), b(z)) can realize the same transfer function k(z). It is clear that k(z) = a(z) −1 m(z) −1 m(z)b(z) = a(z) −1 b(z) for all non-singular polynomial matrices m(z). Thus, the mapping π attaching the transfer function k(z) = a(z) −1 b(z) to the pair of polynomial matrices (a(z), b(z)) is not injective. 12 Consequently, we refer for given rational transfer function k(z) to the class {(a(z), b(z)) : k(z) = a(z) −1 b(z)} as a class of observationally equivalent VARMA realizations of k(z). To achieve identification requires to define a canonical form, selecting one member of each class of observationally equivalent VARMA realizations for a set of considered transfer functions. A first step towards a canonical form is to only consider left coprime pairs (a(z), b(z)). 13 However, left coprimeness is not sufficient for identification and thus further restrictions are required, leading to parameter vectors of smaller dimension than R s 2 (p+q) . A widely used canonical form is the (reverse) echelon canonical form, see Hannan and Deistler (1988, Theorem 2.5.1, p. 59), based on (monic) normalizations of the diagonal elements of a(z) and degree relationships between diagonal and off-diagonal elements as well as the entries in b(z), which lead to zero restrictions. The (reverse) echelon canonical form in conjunction with a transformation to an error correction model was used in VARMA cointegration analysis in the I(1) case, e.g., in Poskitt (2006, Theorem 4.1), but, as for the VAR case, understanding the interdependencies of rank conditions already becomes complicated once one moves to the I(2) case. 11 Below Example 3 we clarify how these indices are related to the state space unit root structure defined in Bauer and Wagner (2012, Definition 2) and link these to the dimensions of the cointegrating spaces in Section 5.2. 12 Uniqueness of realizations in the VAR case stems from the normalization m(z)b(z) = I s , which reduces the class of observationally equivalent VAR realizations of the same transfer function k(z) = a(z) −1 b(z), with b(z) = I s , to a singleton. 13 The pair (a(z), b(z)) is left coprime if all its left divisors are unimodular matrices. Unimodular matrices are polynomial matrices with constant non-zero determinant. Thus, pre-multiplication of, e.g., a(z) with a unimodular matrix u(z) does not affect the determinantal roots that shape the dynamic behavior of the solutions of VAR models.
In the VARMA case matters are further complicated by another well-known problem that makes statistical analysis considerably more involved compared to the VAR case. Although there exists a generalization of the autoregressive order to the VARMA case, such that any transfer function corresponding to a VARMA system has an order n ∈ N (with the precise definition given in the next section) it is known since Hazewinkel and Kalman (1976) that no continuous parameterization of all rational transfer functions of order n exists if s > 1. Therefore, if one wants to keep the above-discussed advantages that continuity of a parameterization provides, the set of transfer functions of order n, henceforth referred to as M n , has to be partitioned into sets on which continuous parameterizations exist, i.e., M n = Γ∈G M Γ , for some index set G, as already mentioned in the introduction. 14 For any given partitioning of the set M n it is important to understand the relationships between the different subsets M Γ , as well as the closures of the pieces M Γ , since in case of misspecification of M Γ points in M Γ \ M Γ cannot be avoided even asymptotically in, e.g., pseudo maximum likelihood estimation. These are more complicated issues in the VARMA case than in the VAR case, see the discussion in Hannan and Deistler (1988, Remark 1 after Theorem 2.5.3).
Based on these considerations, the following section provides and discusses a parameterization that focuses on unit root and cointegration properties, resorting to the state space framework that-as mentioned in the introduction-provides advantages for cointegration analysis. In particular, we derive an almost everywhere homeomorphic parameterization, based on partitioning the set of all considered transfer functions according to a multi-index Γ that contains, among other elements, the state space unit root structure. This implies that certain cointegration properties are invariant for all systems corresponding to a subset M Γ , i.e., the parameterization allows to directly impose cointegration properties such as the "cointegration indices" of Paruolo (1996) mentioned before.
• The process {y t } t∈Z is called unit root process with unit roots z k := e iω k for k = 1, . . . , l, the set F(Ω) := {ω 1 , . . . , ω l } is the set of unit root frequencies and the integers h k , k = 1, . . . , l are the integration orders.

•
A unit root process with unit root structure ((0, d)), d ∈ N, is an I(d) process.
14 When using the echelon canonical form, the partitioning is according to the so-called Kronecker indices related to a basis selection for the row-space of the Hankel matrix corresponding to the transfer function k(z), see, e.g., Hannan and Deistler (1988, chp. 2.4) for a precise definition.
As discussed in Bauer and Wagner (2012) the state space framework is convenient for the analysis of VARMA unit root processes. Detailed treatments of the state space framework are given in Hannan and Deistler (1988) and-in the context of unit root processes- Bauer and Wagner (2012).
A state space representation of a unit root VARMA process is 15 for a white noise process {ε t } t∈Z , ε t ∈ R s , a deterministic process {d t } t∈Z , d t ∈ R m and the unobserved state process {x t } t∈Z , x t ∈ C n , A ∈ C n×n , B ∈ C n×s , C ∈ C s×n and Φ ∈ R s×m .
Remark 3. Bauer and Wagner (2012, Theorem 2) show that every real valued unit root VARMA process {y t } t∈Z as given in (8) has a real valued state space representation with {x t } t∈Z real valued and real valued system matrices (A, B, C). Considering complex valued state space representations in (9) is merely for algebraic convenience, as in general some eigenvalues of A are complex valued. Note for completeness that Bauer and Wagner (2012) contains a detailed discussion why considering the A-matrix in the canonical form in (up to reordering) the Jordan normal form is useful for cointegration analysis. For the sake of brevity we abstain from including this discussion again in the present paper. The key aspect of this construction is its usefulness for cointegration analysis, which becomes visible in Remark 4, where the "simple" unit root properties of blocks of the state vector are discussed.
The transfer function k(z) with real valued power series coefficients corresponding to a real valued unit root process {y t } t∈Z as given in Definition 1 is given by the rational matrix function k(z) = ∆ Ω (z) −1 a(z) −1 b(z). The (possibly complex valued) matrix triple (A, B, C) realizes the transfer function k(z) if and only if π(A, B, C) := I s + zC(I n − zA) −1 B = k(z). Please note that as for VARMA realizations, for a transfer function k(z) there exist multiple state space realizations (A, B, C), with possibly different state dimensions n. A state space system (A, B, C) is minimal if there exists no state space system of lower state dimension realizing the same transfer function k(z). The order of the transfer function k(z) is the state dimension of a minimal system (A, B, C) realizing k(z).
All minimal state space realizations of a transfer function k(z) only differ in the basis of the state (cf. Hannan and Deistler 1988, Theorem 2.3.4), i.e., π(A, B, C) = π(Ã,B,C) for two minimal state space systems (A, B, C) and (Ã,B,C) is equivalent to the existence of a regular matrix T ∈ C n such that A = TÃT −1 , B = TB, C =CT −1 . Thus, the matrices A andÃ are similar for all minimal realizations of a transfer function k(z).
By imposing restrictions on the matrices of a minimal state space system (A, B, C) realizing k(z), Bauer and Wagner (2012, Theorem 2) provide a canonical form, i.e., a mapping of the set M n of transfer functions with real valued power series coefficients defined below onto unique state space realizations (A, B, C). The set M n is defined as To describe the necessary restrictions of the canonical form the following definition is useful: ..,c,j=1,...,s ∈ C c×s is positive upper triangular (p.u.t.) if there exist integers 1 ≤ j 1 ≤ j 2 ≤ · · · ≤ j c ≤ s, such that for j i ≤ s we have b i,j = 0, j < j i , j i < j i+1 , b i,j i ∈ R + ; i.e., B is of the form where the symbol * indicates unrestricted complex-valued entries.
A unique state space realization of k(z) ∈ M n is given as follows (cf. Bauer and Wagner 2012, Theorem 2): Theorem 1. For every transfer function k(z) ∈ M n there exists a unique minimal (complex) state space realization (A, B, C) such that y t = Cx t,C + Φd t + ε t , x t+1,C = Ax t,C + Bε t with: B k,C := B k B k ∈ C 2d k ×s and C k,C := C k , C k ∈ C s×2d k .
for ω k ∈ {0, π}: B k,C := B k ∈ R d k ×s and C k,C := C k ∈ R s×d k . Remark 4. As indicated in Remark 3 and discussed in detail in Bauer and Wagner (2012) considering complex valued quantities is merely for algebraic convenience. For econometric analysis, interest is, of course, on real valued quantities. These can be straightforwardly obtained from the representation given in Theorem 1 as follows.
First define a transformation matrix (and its inverse): Starting from the complex valued canonical representation (A, B, C), a real valued canonical representation with real valued matrices (A R , B R , C R ) follows from using the just defined transformation matrix. In particular it holds that: Before we turn to the real valued state process corresponding to the real valued canonical representation, we first consider the complex valued state process {x t,C } t∈Z in more detail. This process is partitioned according to the partitioning of the matrices C k,C into x t,C : for k = 1, . . . , l.
For k = 1, . . . , l the sub-vectors x t,k are further decomposed into x t,k := [(x 1 t,k ) , . . . , (x h k t,k ) ] , with x j t,k ∈ C d k j for j = 1, . . . , h k according to the partitioning C k = [C k,1 , . . . , C k,h k ]. The partitioning of the complex valued process {x t,C } t∈Z leads to an analogous partitioning of the real valued state process {x t,R } t∈Z , x t,R := [x t,u,R , x t,• ] := [x t,1,R , . . . , x t,l,R , x t,• ] , obtained from with the corresponding block of the state equation given by For k = 1, . . . , l the sub-vectors x t,k,R are further decomposed into x t,k,R : , π} for j = 1, . . . , h k and C k,R := [C k,1,R , . . . , C k,h k ,R ] decomposed accordingly. Wagner (2012, Theorem 3, p. 1328) show that the processes {x j t,k,R } t∈Z have unit root structure ((ω k , h k − j + 1)) for j = 1, . . . , h k and k = 1, . . . , l. Furthermore, for j = 1, . . . , h k and k = 1, . . . , l the processes {x j t,k,R } t∈Z are not cointegrated, as defined in Definition 3 below. For ω k = 0, the process {x j t,k,R } t∈Z is the d j k -dimensional process of stochastic trends of order h 1 − j + 1, while the 2d k j components of {x j t,k,R } t∈Z , for 0 < ω k < π, and the d k j components of {x j t,l,R } t∈Z , for ω k = π, are referred to as stochastic cycles of order h k − j + 1 at their corresponding frequencies ω k .
Remark 5. Parameterizing the stable part of the transfer function using the echelon canonical form is merely one possible choice. Any other canonical form of the stable subsystem and suitable parameterization based on it can be used instead for the stable subsystem.
Remark 6. Starting from a state space system (9) with matrices (A, B, C) in canonical form, a solution for y t , t > 0 (with the solution for t < 0 obtained completely analogously)-for some x 1 = [x 1,u , x 1,• ] -is given by Clearly, the term C u A t−1 u x 1,u is stochastically singular and is effectively like a deterministic component, which may lead to an identification problem with Φd t . If, the deterministic component Φd t is rich enough to "absorb" C u A t−1 u x 1,u , then one solution of the identification problem is to set x 1,u = 0. Rich enough here means, e.g., in the I(1) case with A u = I that d t contains an intercept. Analogously, in the MFI(1) case d t has to contain seasonal dummy variables corresponding to all unit root frequencies. The term C • A t−1 • x 1,• decays exponentially and, therefore, does not impact the asymptotic properties of any statistical procedure. It is, therefore, inconsequential for statistical analysis but convenient (with respect to our definition of unit root processes) to set x 1, This corresponds to the steady state or stationary solution of the stable block of the state equation, and renders {x t,• } t∈N or, when the solution on Z is considered, {x t,• } t∈Z stationary. Please note that these issues with respect to starting values, potential identification problems and their impact or non-impact on statistical procedures also occur in the VAR setting. Bauer and Wagner (2012, Theorem 2) show that minimality of the canonical state space realization (A, B, C) implies full row rank of the p.u.t. blocks B k,h k ,j of B k,h k . In addition to proposing the canonical form, Bauer and Wagner (2012) also provide details how to transform any minimal state space realization into canonical form: Given a minimal state space system (A, B, C) realizing the transfer function k(z) ∈ M n , the first step is to find a similarity transformation T such thatÃ = TAT −1 is of the form given in (10) by using an eigenvalue decomposition, compare Chatelin (1993). In the second step the corresponding subsystem (Ã • ,B • ,C • ) is transformed to echelon canonical form as described in Hannan and Deistler (1988, chp. 2). These two transformations do not lead to a unique realization, because the restrictions on A do not uniquely determine the unstable subsystem (A u , B u , C u ).
For example, in the case Ω = ((ω 1 , h 1 )) = ((0, 1)), To find a unique realization the product C 1 B 1 needs to be uniquely decomposed into factors C 1 and B 1 . This is achieved by performing a QR decomposition of C 1 B 1 (without pivoting) that leads to C 1 C 1 = I. The additional restriction of B 1 being a p.u.t. matrix of full row rank then leads to a unique factorization of C 1 B 1 into C 1 and B 1 . In the general case with an arbitrary unit root structure Ω, similar arguments lead to p.u.t. restrictions on sub-blocks B k,h k ,j in B u and orthogonality restrictions on sub-blocks of C u .
The canonical form introduced in Theorem 1 was designed to be useful for cointegration analysis. To see this, first requires a definition of static and polynomial cointegration (cf. Bauer and Wagner 2012, Definitions 3 and 4).

Remark 7.
(i) It is merely a matter of taste whether cointegrating spaces are defined in terms of their order (Ω,Ω) or their decrease δ(Ω,Ω) := (δ 1 (Ω,Ω), . . . , δ l (Ω,Ω)), with δ k (Ω,Ω) as defined above. Specifying Ω and δ(Ω,Ω) contains the same information as providing the order of (polynomial) cointegration. (ii) Notwithstanding the fact that CIVs and PCIVs in general may lead to changes of the integration orders at different unit root frequencies it may be of interest to "zoom in" on only one unit root frequency ω k , thereby leaving the potential reductions of the integration orders at other unit root frequencies unspecified. This allows to-entirely similarly as in Definition 3-define cointegrating and polynomial cointegrating spaces of different orders at a single unit root frequency ω k . Analogously one can also define cointegrating and polynomial cointegrating spaces of different orders for subsets of the frequencies in F(Ω). (iii) In principle the polynomial cointegrating spaces defined so far are infinite-dimensional as the polynomial degree is not bounded. However, since every polynomial vector β(z) can be written as where by definition {∆ Ω y t } t∈Z has empty unit root structure, it suffices to consider PCIVs of polynomial degree smaller than the polynomial degree of ∆ Ω (z). This shows that it is sufficient to consider finite dimensional polynomial cointegrating spaces. When considering, as in item (ii), (polynomial) cointegration only for one unit root it similarly suffices to consider polynomials of maximal degree equal to h k − 1 for real unit roots and 2h k − 1 for complex unit roots. Thus, in the I(2) case it suffices to consider polynomials of degree one. (iv) The argument about maximal relevant polynomial degrees given in item (iii) can be made more precise and combined with the decrease in Ω achieved. Every polynomial vector β(z) can be written as ω k y t } t∈Z has integration order h k − δ k at frequency ω k . Thus, it suffices to consider PCIVs of polynomial degree smaller than δ k for ω k ∈ {0, π} or 2δ k for 0 < ω k < π when considering the polynomial cointegrating space at ω k with decrease δ k . In the MFI(1) case therefore, when considering only one unit root frequency, again only polynomials of degree one need to be considered. This space is often referred to in the literature as dynamic cointegration space.
To illustrate the advantages of the canonical form for cointegration analysis consider By Remark 4, the process {x j t,k,R } t∈Z is not cointegrated. This implies that β ∈ R s , β = 0, reduces the integration order at unit root z k to h k − j if and only if β [C k,1,R , . . . , C k,j,R ] = 0 and β C k,j+1,R = 0 or equivalently β [C k,1 , . . . , C k,j ] = 0 and β C k,j+1 = 0 (using the transformation to the complex matrices of the canonical form, as discussed in Remark 4, and that β [C k , C k ] = 0 if and only if β C k = 0). Thus, the CIVs are characterized by orthogonality to sub-blocks of C u .
The real valued representation given in Remark 4 used in its partitioned form just above immediately leads to necessary orthogonality constraint for polynomial cointegration of degree one: follows. Since all terms except the first are stationary or deterministic, a necessary condition for a reduction of the unit root structure is the orthogonality of [ β 0 β 1 ] to sub-blocks of . Please note, however, that this orthogonality condition is not sufficient for [β 0 , β 1 ] to be a PCIV, because it does not imply max k=1,...,l β(e iω k ) δ k (Ω,Ω) = 0. For a detailed discussion of polynomial cointegration, when considering also higher polynomial degrees, see Bauer and Wagner (2012, sct. 5).
The following examples illustrate cointegration analysis in the state space framework for the empirically most relevant, i.e., the I(1), MFI(1) and I(2) cases.
Example 1 (Cointegration in the I(1) case). In the I(1) case, neglecting the stable subsystem and the deterministic components for simplicity, it holds that Example 2 (Cointegration in the MFI(1) case with complex unit root z k ). In the MFI(1) case with unit root structure Ω = ((ω k , 1)) and complex unit root z k , neglecting the stable subsystem and the deterministic components for simplicity, it holds that The vector β ∈ R s , β = 0, is a CIV of order (Ω, {}) if and only if β C k = 0 (and thus β C k = 0).

The vector polynomial β(z)
which is equivalent to The fact that the matrix in (11) has a block structure with two blocks of conjugate complex columns implies some additional structure also on the space of PCIVs, here with polynomial degree one. More specifically it Thus, the space of PCIVs of degree (up to) one inherits some additional structure emanating from the occurrence of complex eigenvalues in complex conjugate pairs.
Example 3 (Cointegration in the I(2) case). In the I(2) case, neglecting the stable subsystem and the deterministic components for simplicity, it holds that The above orthogonality constraint indicates that the two cases C G 1,2 = 0 and C G 1,2 = 0 have to be considered separately for polynomial cointegration analysis. Consider first the case C G 1,2 = 0. In this case the orthogonality constraints imply β 0 C E 1,1 = 0, β 1 C E 1,1 = 0 and (β 0 + β 1 ) C E 1,2 = 0. Thus, the vector β 0 + β 1 is a CIV of order ((0, 2), {}) and therefore β(z) = β 0 + β 1 z is of "non-minimum" degree, one in this case rather than zero (β 0 + β 1 ). For a formal definition of minimum degree PCIVs see Bauer and Wagner (2003, Definition 4). In case C G 1,2 = 0 there are PCIVs of degree one that are not simple transformations of static CIVs. Consider β(z) ,2 ] = 0 needs to hold, such that there is no further integrated contribution to {γ 2 y t } t∈Z . Neither γ 1 nor γ 2 are CIVs since both violate the necessary conditions given in the definition of CIVs, which implies that β(z) is indeed a "minimum degree" PCIV.
As was shown above, the unit root and cointegration properties of {y t } t∈Z depend on the sub-blocks of C u and the eigenvalue structure of A u . We, therefore, define the more encompassing state space unit root structure containing information on the geometrical and algebraic multiplicities of the eigenvalues of A u (cf. Bauer and Wagner 2012, Definition 2).

Definition 4.
A unit root process {y t } t∈Z with a canonical state space representation as given in Theorem 1 has state space unit root structure . . , l. For {y t } t∈Z with empty unit root structure Ω S := {}.
Remark 8. The state space unit root structure Ω S contains information concerning the integration properties of the process {y t } t∈Z , since the integers d k j , k = 1, . . . , l, j = 1, . . . , h k describe (multiplied by two for k such that 0 < ω k < π) the numbers of non-cointegrated stochastic trends or cycles of corresponding integration orders, compare again Remark 4. As such, Ω S describes properties of the stochastic process {y t } t∈Z -and, therefore, the state space unit root structure Ω S partitions unit root processes according to these (co-)integration properties. These (co-)integration properties, however, are invariant to a chosen canonical representation, or more generally invariant to whether a VARMA or state space representation is considered. For all minimal state representations of a unit root process {y t } t∈Z these indices-being related to the Jordan normal form-are invariant.
As mentioned in Section 2, Paruolo (1996, Definition 3) introduces integration indices at frequency zero as a triple of integers (r 0 , r 1 , r 2 ). These correspond to the numbers of columns of the matrices β, β 1 , β 2 in the error correction representation of I(2) VAR processes, see, e.g., Johansen (1997, sct. 3). Here, r 2 is the number of stochastic trends of order two, i.e., r 2 = d 1 1 . Furthermore, r 1 is the number of stochastic trends of order one that do not cointegrate with β 2 ∆ 0 {y t } t∈Z and hence r 1 = d 1 2 − d 1 1 . Therefore, the integration indices at frequency zero are in one-one correspondence with the state space unit root structure Ω S = ((0, d 1 1 , d 1 2 )) for I(2) processes and the dimension s = r 0 + r 1 + r 2 of the process.
The canonical form given in Theorem 1 imposes p.u.t. structures on sub-blocks of the matrix B u . The occurrence of these blocks-related to d k j > d k j−1 -is determined by the state space unit root structure Ω S . The number of free entries in these p.u.t.-blocks, however, is not determined by Ω S . Consequently, we need structure indices p ∈ N n u 0 indicating for each row the position of a potentially restricted positive element, as formalized below: Definition 5 (Structure indices). For the block B u ∈ C n u ×s of the matrix B of a state space realization (A, B, C) in canonical form, define the corresponding structure indices p ∈ N n u 0 as p i := 0 if the i-th row of B u is not part of a p.u.t. block, j if the i-th row of B u is part of a p.u.t. block and its j-th entry is restricted to be positive.

Remark 9.
Since sub-blocks of B u corresponding to complex unit roots are of the form B k,C = [B k , B k ] , the entries restricted to be positive are located in the same columns and rows of both B k and B k . Thus, the structure indices p i of the corresponding rows are identical for B k and B k . Therefore, it would be possible to omit the parts of p corresponding to the blocks B k . It is, however, as will be seen in Definition 9, advantageous for the comparison of unit root structures and structure indices that p is a vector with n u entries.
For given state space unit root structure Ω S the matrix A u is fully determined. The parameterization of the set of feasible matrices B u for given structure indices p and of the set of stable subsystems (A • , B • , C • ) for given Kronecker indices α • (cf. Hannan and Deistler 1988, chp. 2.) is straightforward, since the entries in these matrices are either unrestricted, restricted to zero or restricted to be positive. Matters are a bit more complicated for C u . One possibility to parameterize the set of possible matrices C u for a given state space unit root structure Ω S is to use real and complex valued Givens rotations (cf. Golub and van Loan 1996, chp. 5.1).
Definition 6 (Real Givens rotation). The real Givens rotation

Remark 10. Givens rotations allow transforming any vector
This is achieved by the following algorithm:
Remark 11. The determinant of real Givens rotations is equal to one, i.e., det(R s,i,j (θ)) = 1 for all s, i, j ∈ N and all θ ∈ [0, 2π). Thus, it is not possible to factorize an orthonormal matrix Q with det(Q) = −1 into a product of Givens rotations. This obvious fact has implications for the parameterization of C-matrices as is detailed below.
To set the stage for the general case, we start the discussion of the parameterization of the set of matrices (A, B, C) in canonical form with the MFI(1) and I(2) cases. These two cases display all ingredients required later for the general case. The MFI(1) case illustrates the usage of either real or complex Givens rotations, depending on whether the considered C-block corresponds to a real or complex unit root. The I(2) case highlights recursive orthogonality constraints on the parameters of the C-block, which are related to the polynomial cointegration properties (cf. Example 3).

The Parameterization in the MFI(1) Case
The state space unit root structure of an MFI(1) process is given by Starting with the sub-blocks of C u , it is convenient to separate the discussion of the parameterization of C u -blocks into the real case, where ω k ∈ {0, π} and C k ∈ R s×d k 1 , and the complex case with 0 < ω k < π and C k ∈ C s×d k 1 . For the case of real unit roots the two cases d k 1 < s and d k 1 = s have to be distinguished. For brevity of notation refer to the considered real block simply as C ∈ R s×d . Using this notation, the set of matrices to be parameterized is The parameterization of O s,d is based on the combination of real Givens rotations, as given in Definition 6, that allow transforming every matrix in O s,d to the form [I d , 0 (s−d)×d ] for d < s. For d = s, Givens rotations allow transforming every matrix C ∈ O s,s either to I s or I − s := diag(I s−1 , −1), since, compare Remark 11, for the transformed matrixC (s) it holds that det(C) = det(C (s) ) ∈ {−1, 1}. This is achieved with the following algorithm: 1.

2.
Transform the entries [c j,j , . . . , c j,d ] in the j-th row of C (j) , to [c j,j , 0, . . . , 0],c j,j ≥ 0. Since this is a row vector, this is achieved by right-multiplication of C (j) with transposed Givens rotations and the required parameters are obtained via the algorithm described in Remark 10. The first j − 1 entries of the j-th row remain unchanged. Denote the transformed matrix by C (j+1) . 3.
If j = d − 1 stop. Else increment j by one (j → j + 1) and continue at step 2. 4.
Collect all parameters used for the Givens rotations in steps 1 to 3 in a parameter vector θ R .
Steps 1-3 correspond to a QR decomposition of C = QC , with an orthonormal matrix Q given by the product of the Givens rotations. Please note that the first j − 1 entries of the j-th column of C = C (d) are equal to zero by construction.
and no Givens rotations are defined.
Collect all parameters used for the Givens rotations in steps 5 to 7 in a parameter vector θ L .
The parameter vector θ = [θ L , θ R ] , contains the angles of the employed Givens rotations and provides one way of parameterizing O s,d . The following Lemma 1 demonstrates the usefulness of this parameterization.
The algorithm discussed above defines the inverse mapping This is an open and dense subset of O s,d .
In this case, steps 1-4 of the algorithm discussed above define the inverse mapping C −1 . . , π] ∈ R s(s−1)/2 . Then a parameterization of O s,s is given by .

The parameterization is infinitely often differentiable with infinitely often differentiable inverse on an open and dense subset of O s,s .
Remark 13. The following arguments illustrate why C −1 O is not continuous on the pre-image of the boundary of Θ R O : Consider the unit sphere O 3,1 = {C ∈ R 3 |C C = C 2 = 1}. One way to parameterize the unit sphere is to use degrees of longitude and latitude. Two types of discontinuities occur: After fixing the location of the zero degree of longitude, i.e., the prime meridian, its anti-meridian is described by both 180 • W and 180 • E. Using the half-open interval [0, 2π) in our parametrization causes a similar discontinuity. Second, the degree of longitude is irrelevant at the north pole. As seen in Remark 10, with our parameterization a similar issue occurs when the first two entries of C to be compared are both equal to zero. In this case the parameter of the Givens rotation is set to zero, although every θ will produce the same result. Both discontinuities clearly occur on a thin subset of O s,d .
As in the parametrization of the VAR I(1)-case in the VECM framework, where the restriction β = [I s−d , β * ] can only be imposed when the upper (s − d) × (s − d) block of the true β 0 of the DGP is of full rank (cf. Johansen 1995, chp. 5.2), the set where the discontinuities occur can effectively be changed by a permutation of the components of the observed time series. This corresponds to redefining the locations of the prime meridian and the poles.

Remark 14. Please note that the parameterization partitions the parameter vector θ into two parts
Since changing the parameter values in θ R does not change the column space of C O (θ), which, as seen above, determines the cointegrating vectors, θ L fully characterizes the (static) cointegrating space. Please note that the dimension of θ L is d(s − d) and thus coincides with the number of free parameters in β in the VECM framework (cf. Johansen 1995, chp. 5.2).
with d = 2 and s = 3. As discussed, the static cointegrating space is characterized by the left kernel of this matrix. The left kernel of a matrix in R 3×2 with full rank two is given by a one-dimensional space, with the corresponding basis vector parameterized, when normalized to length one, by two free parameters. Thus, for the characterization of the static cointegrating space two parameters are required, which exactly coincides with the dimension of θ L given in Remark 14. The parameters in θ R correspond to the choice of a basis of the image of C.
Having fixed the two-dimensional subspace through θ L , only one free parameter for the choice of an orthonormal basis remains, which again coincides with the dimension given in Remark 14. To obtain the parameter vector, the starting point is a QR decomposition of C = R R (θ R )C . In this example R R (θ R ) = R 2,1,2 (θ R,1 ), with θ R,1 to be determined. To find θ R,1 , solve [ 0 1 √ 2 ]R 2,1,2 (θ R,1 ) = [ r 0 ] for r ≥ 0 and θ R,1 ∈ [0, 2π). In other words, find r ≥ 0 and θ R, Thus, the orthonormal matrix R R (θ R ) is equal to R 2,1,2 π 2 and the transpose of the upper triangular matrixC is equal to: Second, transform the entries in the lower 1 × 2-sub-block ofC (0) to zero, starting with the last column. For this find θ L,2 ∈ [0, 2π) such that R 3,2,3 (θ L,2 )[ 0 1 ]. This yields r = 1, θ L,2 = 7π 4 . Next computeC (1) = R 3,2,3 ( 7π 4 )C (0) : In the final step find θ L,1 ∈ [0, 2π) such that R 3,1,3 (θ L,1 )[ 1 The solution is r = 1, θ L,1 = π 4 . Combining the transformations leads to The parameter vector for this matrix is therefore In case of complex unit roots, referring for brevity again to the considered block C k simply as C ∈ C s×d , the set of matrices to be parameterized is The parameterization of this set is based on the combination of complex Givens rotations, as given in Definition 7, which can be used to transform every matrix in U s,d to the form [D d , 0 (s−d)×d ] with a diagonal matrix D d whose diagonal elements are of unit modulus. This transformation is achieved with the following algorithm:

2.
Transform the entries [c j,j , . . . , c j,d ] in the j-th row of C (j) , to [c j,j , 0, . . . , 0]. Since this is a row vector, this is achieved by right-multiplication of C with transposed Givens rotations and the required parameters are obtained via the algorithm described in Remark 12. The first j − 1 entries of the j-th row remain unchanged. Denote the transformed matrix by C (j+1) . 3.
If j = d − 1 stop. Else increment j by one (j → j + 1) and continue at step 2.

4.
Collect all parameters used for the Givens rotations in steps 1 to 3 in a parameter vector ϕ R .
Step 1-3 corresponds to a QR decomposition of C = QC , with a unitary matrix Q given by the product of the Givens rotations. Please note that the first j − 1 entries of the j-th column of C = C (d) are equal to zero by construction.
Collect all parameters used for the Givens rotations in steps 5 to 7 in a parameter vector ϕ L .

9.
Transform the diagonal entries of the transformed matrixC (d) = [D d , 0 (s−d)×d ] into polar coordinates and collect the angles in a parameter vector ϕ D .
The following lemma demonstrates the usefulness of this parameterization.
Lemma 2 (Properties of the parametrization of U s,d ). Define for d ≤ s a mapping ϕ → C U (ϕ) from (i) U s,d is closed and bounded.
(ii) The mapping C U (ϕ) is infinitely often differentiable.
(iii) For every C ∈ U s,d a vector ϕ ∈ Θ C U exists such that The algorithm discussed above defines the inverse mapping C −1 U : U s,d → Θ R U . (iv) The inverse mapping C −1 U (·)-the parameterization of U s,d -is infinitely often differentiable on an open and dense subset of U s,d .
Remark 15. Note the partitioning of the parameter vector ϕ into the parts ϕ L ,ϕ D and ϕ R . The component ϕ L fully characterizes the column space of C U (ϕ), i.e., ϕ L determines the cointegrating spaces.
Example 6. Consider the matrix The starting point is again a QR decomposition of C = Q R (ϕ R )C = Q 2,1,2 (ϕ R,1 )C . To find a complex Givens rotation such that and ϕ a = ϕ b = 7π 4 . Using the results of Remark 12, the parameters of the Givens rotation are ϕ R,1,1 = tan −1 ( b a ) = π 4 and ϕ R,1,2 = ϕ a − ϕ b = 0. Right-multiplication of C with Q 2,1,2 π 4 , 0 leads tõ Since the entries in the lower 1 × 2-sub-block ofC are already equal to zero, the remaining complex Givens rotations are Q 3 , with ϕ = C −1 U (C).

Components of the Parameter Vector
Based on the results of the preceding sections we can now describe the parameter vectors for the general case. The dimensions of the parameter vectors of the respective blocks of the system matrices (A, B, C) depend on the multi-index Γ, consisting of the state space unit root structure Ω S , the structure indices p and the Kronecker indices α • for the stable subsystem. A parameterization of the set of all systems in canonical form with given multi-index Γ for the MFI(1) case, therefore, combines the following components: for k = 1, . . . , l, with p k j denoting the j-th entry of the structure indices p corresponding to B k . The vectors θ B, f ,k contain the real and imaginary parts of free entries in B k not restricted by the p.u.t. structures.
The parameters for the matrices C k as discussed in Lemma 1 and Lemma 2.
The parameters for the stable subsystem in echelon canonical form for Kronecker indices α • .

The Parameterization in the I(2) Case
The canonical form provided above for the general case has the following form for I(2) processes with unit root structure Ω s = ((0, d 1 1 , d 1 2 )): The parameterizations of the p.u.t. matrices B 1,2,1 and B 1,2,2 are as discussed above. The entries of B 1,1 are unrestricted and thus included in the parameter vector θ B, f containing also the free entries in B 1,2,1 and B 1,2,2 . The subsystem (A • , B • , C • ) is parameterized using the echelon canonical form.
The parameterization of C E 1,1 ∈ O s,d 1 1 proceeds as in the MFI(1) case, using C −1 O (C E 1,1 ). The parameterization of C E 1,2 has to take the restriction of orthogonality of C E 1,2 to C E 1,1 into account, thus the set to be parameterized is given by The parameterization of this set again uses real Givens rotations. For C ∈ O s,d 1 with R L (θ L ) corresponding to C E 1,1 . The matrixC is parameterized as discussed in Lemma 1.

Corollary 1 (Properties of the parameterization of
where θ L denotes the parameter values corresponding to [θ L , θ R ] = C −1 O (C E 1,1 ) as defined in Lemma 1. The following properties hold: The algorithm discussed above Lemma 1 defines the inverse mapping C −1 (C E 1,1 ) is a disconnected space with two disjoint non-empty closed subsets: Steps 1-4 of the algorithm discussed above Lemma 1 define the inverse mapping C −1 The parameterization is infinitely often differentiable with infinitely often differentiable inverse on an open and dense subset of O s,s .
The proof of Corollary 1 uses the same arguments as the proof of Lemma 1 and is, therefore, omitted. It remains to provide a parameterization for C G 1,2 restricted to be orthogonal to both C E 1,1 and C E 1,2 . Thus, the set to be parametrized is given by The parameterization of O s,G (C E 1,1 , C E 1,2 ) is straightforward: Left multiplication of C G 1,2 with R L (θ L ) as defined in Lemma 1 and of the lower (s − d 1 1 ) × d 1 1 -block with R L (θ L ) as defined in Corollary 1 transforms the upper d 1 2 × d 1 1 -block to zero and collects the free parameters in the lower (s − d 1 2 ) × d 1 1 -block. Clearly, this is a bijective and infinitely often differentiable mapping on O s,G (C E 1,1 , C E 1,2 ) and thus a useful parameterization, since the matrix C G 1,2 is only multiplied with two constant invertible matrices. The entries of the matrix product are then collected in a parameter vector as shown in Corollary 2.

Components of the Parameter Vector
In the I(2) case, the multi-index Γ contains the state space unit root structure Ω S = ((0, d 1 1 , d 1 2 )), the structure indices p ∈ N d 1 1 +d 1 2 0 , encoding the p.u.t. structures of B 1,2,1 and B 1,2,2 , and the Kronecker indices α • for the stable subsystem. The parameterization of the set of all systems in canonical form with given multi-index Γ for the I(2) case uses the following components: The parameters for the matrices C E 1,1 as in the MFI(1) case and C E 1,2 as discussed in Corollary 1.
The parameters for the matrix C G 1,2 as discussed in Corollary 2.
The parameters for the stable subsystem in echelon canonical form for Kronecker indices α • .

The Parameterization in the General Case
Inspecting the canonical form shows that all relevant building blocks are already present in the MFI(1) and the I(2) cases and can be combined to deal with the general case: The entries in B u are either unrestricted or follow restrictions according to given structure indices p, and the parameter space is chosen accordingly, as discussed for the MFI(1) and I(2) cases. The restrictions on the matrices C u and its blocks C k require more sophisticated parameterizations of parts of unitary or orthonormal matrices as well as of orthogonal complements. These are dealt with in Lemmas 1 and 2 and Corollaries 1 and 2 above. The extension of Corollaries 1 and 2 to complex matrices and to matrices which are orthogonal to a larger number of blocks of C k is straightforward.
The following theorem characterizes the properties of parameterizations for sets M Γ of transfer functions with (general) multi-index Γ and describes the relations between sets of transfer functions and the corresponding sets ∆ Γ of triples (A, B, C) of system matrices in canonical form, defined below. Discussing the continuity and differentiability of mappings on sets of transfer functions and on sets of matrix triples also requires the definition of a topology on both sets.

Definition 8.
(i) The set of transfer functions of order n, M n , is endowed with the pointwise topology T pt : First, identify transfer functions with their impulse response sequences. Then, a sequence of transfer functions k i (z) = I s + ∑ ∞ j=1 K j,i z j converges in T pt to k 0 (z) = I s + ∑ ∞ j=1 K j,0 z j if and only if for every j ∈ N it holds that K j,i i→∞ → K j,0 .
(ii) The set of all triples (A, B, C) in canonical form corresponding to transfer functions with multi-index Γ is called ∆ Γ . The set ∆ Γ is endowed with the topology corresponding to the distance Please note that in the definition of the pointwise topology convergence does not need to be uniform in j and moreover, the power series coefficients do not need to converge to zero for j → ∞ and hence the concept can also be used for unstable systems.
• The parameter vector θ B,p = [θ B,p,1 , ..., θ B,p,l ] ∈ Θ B,p = R d B,p + , collecting the entries in B k , k = 1, . . . , l, restricted by the p.u.t. forms to be positive reals in a similar fashion as described for B 1 in the I(2) case.

•
The parameter vector θ C,G = [θ C,G,1 , ..., θ C,G,l ] ∈ Θ C,G = R d C,G , θ C,G,k = [θ C,G,k,2 , . . . , θ C,G,k,h k ] collecting the parameters θ C,G,k,j (real and imaginary parts for complex roots) for C G k,j , k = 1, . . . , l and j = 2, . . . , h k , subject to the orthogonality restrictions (see Corollary 2 and its extension to complex matrices).  As mentioned in Section 2, the parameterization of Φ is straightforward. The s × m entries of Φ are collected in a parameter vector d. Thus, there is a one-to-one correspondence between state space realizations (A, B, C, Φ) ∈ ∆ Γ × R s×m and parameter vectors τ = [θ , d ] ∈ Θ Γ × R sm . The same holds true for parameters used for the symmetric, positive definite innovation matrix Σ ∈ R s×s obtained, e.g., from a lower triangular Cholesky factor of Σ.

The Topological Structure
The parameterization of M n in Theorem 2 partitions M n into subsets M Γ for a selection of multi-indices Γ. To every multi-index Γ there exists a corresponding associated parameter set Θ Γ . Thus, in practical applications, maximizing the pseudo likelihood requires choosing the multi-index Γ. Maximizing the pseudo likelihood over the set M Γ effectively amounts to including also all elements in the closure of M Γ , because of continuity of the parameterization. It is thus necessary to characterize the closures of the sets M Γ .
Moreover, maximizing the pseudo likelihood function over all possible multi-indices is time-consuming and not desirable. Fortunately, the results discussed below show that there exists a generic multi-index Γ g such that M n ⊂ M Γ g . This generic choice corresponds to the set of all stable systems of order n corresponding to the generic neighborhood of the echelon canonical form. This multi-index, therefore, is a natural starting point for estimation.
However, in particular for hypotheses testing, it will be necessary to maximize the pseudo likelihood over sets of transfer functions of order n with specific state space unit root structure Ω S , denoted as M(Ω S , n • ) below, where n • denotes the dimension of the stable part of the state. We show below that also in this case there exists a generic multi-index Γ g (Ω S , n • ) such that M(Ω S , n • ) ⊂ M Γ g (Ω S ,n • ) .
The main tool to obtain these results is investigating the properties of the mappings ψ Γ , that map transfer functions in M Γ to triples (A, B, C) ∈ ∆ Γ , as well as analyzing the closures of the sets ∆ Γ . The relation between parameter vectors θ ∈ Θ Γ and triples of system matrices (A, B, C) ∈ ∆ Γ is easier to understand than the relation between ∆ Γ and M Γ , due to the results of Theorem 2. Consequently, this section focuses on the relations between ∆ Γ and M Γ -and their closures-for different multi-indices Γ.
To define the closures we embed the sets ∆ Γ of matrices in canonical form with multi-indices Γ corresponding to transfer functions of order n into the space ∆ n of all conformable complex matrix triples (A, B, C) with A ∈ C n×n , where additionally λ |max| (A) ≤ 1. Since the elements of ∆ n are matrix triples, this set is isomorphic to a subset of the finite dimensional space C n 2 +2ns , equipped with the Euclidean topology. Please note that ∆ n also contains non-minimal state space realizations, corresponding to transfer functions of lower order.

Remark 16.
In principle the set ∆ n also contains state space realizations of transfer functions k(z) = I s + ∑ ∞ j=1 K j z j with complex valued coefficients K j . Since the subset of ∆ n of state space systems realizing transfer functions with real valued K j is closed in ∆ n , realizations corresponding to transfer functions with coefficients with non-zero imaginary part are irrelevant for the analysis of the closures of the sets ∆ Γ .
After investigating the closure of ∆ Γ in ∆ n , denoted by ∆ Γ , we consider the set of corresponding transfer functions π(∆ Γ ). Since we effectively maximize the pseudo likelihood over ∆ Γ , we have to understand for which multi-indicesΓ the set π(∆Γ) is a subset of π(∆ Γ ). Moreover, we find a covering of π(∆ Γ ) ⊂ i∈I M Γ i . This restricts the set of multi-indices Γ that may occur as possible multi-indices of the limit of a sequence in π(∆ Γ ) and thus the set of transfer functions that can be obtained by maximization of the pseudo likelihood.
The sets M Γ , are embedded into the vector space M of all causal transfer functions k(z) = I s + ∑ ∞ j=1 K j z j . The vector space M is isomorphic to the infinite dimensional space Π j∈N R s×s j equipped with the pointwise topology. Since, as mentioned above, maximization of the pseudo likelihood function over M Γ effectively includes M Γ , it is important to determine for any given multi-index Γ, the multi-indicesΓ for which the set MΓ is a subset of M Γ . Please note that M Γ is not necessarily equal to π(∆ Γ ). The continuity of π, as shown in Theorem 2 (i), implies the following inclusions: In general all these inclusions are strict. For a discussion in case of stable transfer functions see Hannan and Deistler (1988, Theorem 2.5.3).
We first define a partial ordering on the set of multi-indices Γ. Subsequently we examine the closure ∆ Γ in ∆ n and finally we examine the closures M Γ in M.

Definition 9.
(i) For two state space unit root structures Ω S andΩ S with corresponding matrices A u ∈ C n u ×n u and A u ∈ Cñ u ×ñ u in canonical form, it holds thatΩ S ≤ Ω S if and only if there exists a permutation matrix S such that Moreover,Ω S < Ω S holds if additionallyΩ S = Ω S . (ii) For two state space unit root structures Ω S andΩ S and dimensions of the stable subsystems n • ,ñ • ∈ N 0 we define Strict inequality holds, if at least one of the two inequalities above holds strictly. (iii) For two pairs (Ω S , p) and (Ω S ,p) with corresponding matrices A u ∈ C n u ×n u andÃ u ∈ Cñ u ×ñ u in canonical form, it holds that (Ω S ,p) ≤ (Ω S , p) if and only if there exists a permutation matrix S such that where p 1 ∈ Nñ u 0 andp restricts at least as many entries as p 1 , i.e.,p i ≥ (p 1 ) i holds for all i = 1, . . . ,ñ u . Moreover, (Ω S ,p) < (Ω S , p) holds if additionally (Ω S ,p) = (Ω S , p).
Strict inequality holds, if at least one of the inequalities above holds strictly.
Please note that (i) implies thatΩ S only contains unit roots that are also contained in Ω S , with the integration ordersh k of the unit roots inΩ S smaller or equal to the integration orders of the respective unit roots in Ω S . Thus, denoting the unit root structures corresponding toΩ S and Ω S byΩ and Ω, it follows thatΩ S ≤ Ω S impliesΩ Ω. The reverse does not hold as, e.g., for Ω S = ((0, 1, 1)) (where hence Ω = ((0, 2))) andΩ S = ((0, 2)) (withΩ = ((0, 1))) it holds thatΩ ≺ Ω, but neitherΩ S ≤ Ω S nor Ω S ≤Ω S holds as here This partial ordering is convenient for the characterization of the closure of ∆ Γ .

The Closure of ∆ Γ in ∆ n
Please note that the block-structure of A implies that every system in ∆ Γ can be separated in two subsystems (A u , B u , C u ) and ( The canonical form imposes a lot of structure, i.e., restrictions on the matrices A, B and C. By definition ∆ Ω S ,p = ∆ A Ω S ,p × ∆ B Ω S ,p × ∆ C Ω S ,p and the closures of the three matrices can be analyzed separately. ∆ A Ω S ,p and ∆ C Ω S ,p are very easy to investigate. The structure of A is fully determined by Ω S and consequently ∆ A Ω S ,p consists of a single matrix A which immediately implies that ∆ A Ω S ,p = ∆ A Ω S ,p . The matrix C, compare Theorem 1 is composed of blocks C E k that are sub-blocks of unitary (or orthonormal) matrices and blocks C G k that have to fulfill (recursive) orthogonality constraints. The corresponding sets were shown to be closed in Lemmas 1 and 2 and Corollaries 1 and 2. Thus, ∆ C Ω S ,p = ∆ C Ω S ,p . It remains to discuss ∆ B Ω S ,p . The structure indices p defining the p.u.t. structures of the matrices B k restrict some entries to be positive. Combining all the parameters-unrestricted with complex values parameterized by real and imaginary part and the positive entries-into a parameter vector leads to an open sub-set of R m for some m. For convergent sequences of systems with fixed Ω S and p, limits of entries restricted to be positive may be zero. When this happens, two cases have to be distinguished. First, all p.u.t. sub-matrices still have full row rank. In this case the limiting system, (A 0 , B 0 , C 0 ) say, is still minimal and can be transformed to a system in canonical form (Ã 0 ,B 0 ,C 0 ) with fewer unrestricted entries inB 0 .
In the limiting system x t,2 = 0 is redundant and {y t } t∈Z is an I(1) process rather than an I(2) process. Dropping x t,2 leads to a state space realisation of the limiting system {y t } t∈Z given by x 1,1 = x 1,3 = 0.
In caseB has full rank, the above system is minimal. Since b 1,2,2,1 > 0, the matrixB needs to be transformed into p.u.t. format. By definition all systems in the sequence, with b 1,2,1,2 = 0, have structure indices p = [0, 2, 1] as discussed in Example 12. The limiting system-in case of full rank ofB-has indicesp = [1, 2] . To relate to Definition 9 choose the permutation matrix This shows that (p) i > (p 1 ) i , i = 1, 2 and thus the limiting system has a smaller multi-index Γ than the systems of the sequence. In caseB has reduced rank equal to one a further reduction in the system order to n = 1 along similar lines as discussed is possible, again leading to a limiting system with smaller multi-index Γ.
The discussion shows that the closure of ∆ B Ω S ,p is related to lower order systems in the sense of Definition 9. The precise statement is given in Theorem 3 after a discussion of the closure of the stable subsystems.

The Closure of ∆ α •
Consider a convergent sequence of systems {(A j , B j , C j )} j∈N in ∆ α • and denote the limiting system by (A 0 , B 0 , C 0 ). Clearly, λ |max| (A 0 ) ≤ 1 holds true for the limit A 0 of the sequence {A j } j∈N with λ |max| (A j ) < 1 for all j. Therefore, two cases have to be discussed for the limit:
The first case is well understood, compare Hannan and Deistler (1988, chp. 2), since the limit in this case corresponds to a stable transfer function. In the second case the limiting system can be separated into two subsystems (J 2 ,B u ,C u ) and (Ã • ,B • ,C • ), according to the block diagonal structure ofÃ. The state space unit root structure of the limiting system (A 0 , B 0 , C 0 ) depends on the multiplicities of the eigenvalues of the matrixJ 2 and is greater (in the sense of Definition 9) than the empty state space unit root structure. At the same time the Kronecker indices of the subsystem (Ã • ,B • ,C • ) are smaller than α • , compare again Hannan and Deistler (1988, chp. 2). Since the Kronecker indices impose restrictions on some entries of the matrices A j and thus also on A 0 , the blockJ 2 and consequently also the limiting state space unit root structure might be subject to further restrictions.

The Conformable Index Set and the Closure of ∆ Γ
The previous subsection shows that the closure of ∆ Γ does not only contain systems corresponding to transfer functions with multi-index smaller or equal to Γ, but also systems that are related in a different way that is formalized below.

•
The pair (Ω S ,p) with corresponding matrixÃ u in canonical form extends (Ω S , p) with corresponding matrix A u in canonical form, i.e., there exists a permutation matrix S such that SÃ u S = A u 0 0J 2 and Sp = p p 2 , Please note that the definition implies Γ ∈ K(Γ). The importance of the set K(Γ) is clarified in the following theorem: Theorem 3. Transfer functions corresponding to state space realizations with multi-indexΓ ≤ Γ are contained in the set π(∆ Γ ). The set π(∆ Γ ) is contained in the union of all sets MΓ forΓ ≤Γ withΓ conformable to Γ, i.e.,

MΓ.
Theorem 3 provides a characterization of the transfer functions corresponding to systems in the closure of ∆ Γ . The conformable set K(Γ) plays a key role here, since it characterizes the set of all minimal systems that can be obtained as limits of convergent sequences from within the set ∆ Γ . Conformable indices extend the matrix A u corresponding to the unit root structure by the blockJ 2 .
The second inclusion in Theorem 3 is potentially strict, depending on the Kronecker indices α • in Γ. Equality holds, e.g., in the following case: Corollary 3. For every multi-index Γ with n • = 0 the set of conformable indices consists only of Γ, which implies π(∆ Γ ) = Γ ≤Γ MΓ.

The Closure of M Γ
It remains to investigate the closure of M Γ in M. Hannan and Deistler (1988, Theorem 2.6.5 (ii) and Remark 3, p. 73) show that for any order n, there exist Kronecker indices α •,g = α •,g (n) corresponding to the generic neighborhood M α •,g for transfer functions of order n such that It can be easily seen that a generic neighborhood also exists for systems with state space unit root structure Ω S and without stable subsystem: Set the structure indices p to have a minimal number of elements restricted in p.u.t. sub-blocks of B u , i.e., for any block B k,h k ,j ∈ C n k,h k ,j ×s , or B k,h k ,j ∈ R n k,h k ,j ×s in case of a real unit root, set the corresponding structure indices to p = [1, . . . , n k,h k ,j ]. Any p.u.t. matrix can be approximated by a matrix in this generic neighborhood with some positive entries restricted by the p.u.t. structure tending to zero. Combining these results with Theorem 3 implies the existence of a generic neighborhood for the canonical form considered in this paper: Theorem 4. Let M(Ω S , n • ) be the set of all transfer functions k(z) ∈ M n u (Ω S )+n • with state space unit root structure Ω S . For every Ω S and n • , there exists a multi-index Γ g := Γ g (Ω S , n • ) such that Moreover, it holds that M(Ω S , n • ) ⊂ M α •,g (n) for every Ω S and n • satisfying n u (Ω S ) + n • ≤ n.
Theorem 4 is the basis for choosing a generic multi-index Γ for maximizing the pseudo likelihood function. For every Ω S and n • there exists a generic piece that-in its closure-contains all transfer functions of order n u (Ω S ) + n • and state space unit root structure Ω S : The set of transfer functions corresponding to the multi-index with the largest possible structure indices p in the sense of Definition 9 (iii) and generic Kronecker indices for the stable subsystem. Choosing these sets and their corresponding parameter spaces as model sets is, therefore, the most convenient choice for numerical maximization, if only Ω S and n • are known.
If, e.g., only an upper bound for the system order n is known and the goal is only to obtain consistent estimators, using α •,g (n) is a feasible choice, since all transfer functions in the closure of the set M α •,g (n) can be approximated arbitrarily well, regardless of their potential state space unit root structure Ω S , n u (Ω S ) ≤ n. For testing hypotheses, however, it is important to understand the topological relations between sets corresponding to different multi-indices Γ. In the following we focus on the multi-indices Γ g (Ω S , n • ) for arbitrary Ω S and n • .
The closure of M(Ω S , n • ) contains also transfer functions that have a different state space unit root structure than Ω S . Considering convergent sequences of state space realizations (A j , B j , C j ) j∈N of transfer functions in M(Ω S , n • ), the state space unit root structure of (A 0 , B 0 , C 0 ) := lim j→∞ (A j , B j , C j ) may differ in three ways:

•
For sequences (A j , B j , C j ) j∈N in canonical form rows of B u,j can tend to zero, which reduces the state space unit root structure as discussed in Section 4.1.1.

•
Stable eigenvalues of A j may converge to the unit circle, thereby extending the unit root structure.
• Off-diagonal entries of the sub-block A u,j of A j = T j A j T −1 j may be converging to zeros in the sub-block A u,0 of the limit A 0 = T 0 A 0 T −1 0 in canonical form, resulting in a different attainable state space unit root structure. Here T j ∈ C n×n for all j ∈ N are regular matrices transforming A j to canonical form and T 0 ∈ C n×n transforms A 0 accordingly.
The first change of Ω S described above results in a transfer function with smaller state space unit root structure according to Definition 9 (ii). The implications of the other two cases are summarized in the following definition: Definition 11 (Attainable unit root structures). For given n • and Ω S the set A(Ω S , n • ) of attainable unit root structures contains all pairs (Ω S ,ñ • ), whereΩ S with corresponding matrixÃ u in canonical form extends Ω S with corresponding matrix A u in canonical form, i.e., there exists a permutation matrix S such that whereǍ u can be obtained by replacing off-diagonal entries in A u by zeros and whereñ • := n • − d J with d J the dimension of J 2 ∈ C d J ×d J . (ii) For every generic multi-index Γ g corresponding to Ω S and n • it holds that

Remark 17. It is a direct consequence of the definition of
Theorem 5 has important consequences for statistical analysis, e.g., PML estimation, since-as stated several times already-maximizing the pseudo likelihood function over Θ Γ effectively amounts to calculating the supremum over the larger set M Γ . Depending on the choice of Γ the following asymptotic behavior may occur:

•
If Γ is chosen correctly and the estimator of the transfer function is consistent, openness of M Γ in its closure implies that the probability of the estimator being an interior point of M Γ tends to one asymptotically. Since the mapping attaching the parameters to the transfer function is continuous on an open and dense set, consistency in terms of transfer functions, therefore, implies generic consistency of the parameter estimators.

•
If the multi-index is incorrectly chosen to equal Γ, estimator consistency is still possible if the true multi-index Γ 0 < Γ, as in this case M Γ 0 ⊂ M Γ . This is in some sense not too surprising and something that is also well-known in the simpler VAR framework where consistency of OLS can be established when the true autoregressive order is smaller than the order chosen for estimation. Analogous to the lag number in the VAR case, thus, a necessary condition for consistency is to choose the system order larger or equal to the true system order.
Finally, note that Theorem 5 also implies the following result relevant for the determination of the unit root structure, further discussed in Sections 5.1.1 and 5.2.1: Corollary 4. For every pair (Ω S ,ñ • ) ∈ A(Ω S , n • ) it holds that M(Ω S ,ñ • ) ⊂ M(Ω S , n • ).

Testing Commonly Used Hypotheses in the MFI(1) and I(2) Cases
This section discusses a large number of hypotheses, respectively restrictions, on cointegrating spaces, adjustment coefficients and deterministic components often tested in the empirical literature. As with the VECM framework, as discussed for the I(2) case in Section 2, testing hypotheses on the cointegrating spaces or adjustment coefficients may necessitate different reparameterizations.

The MFI(1) Case
The two by far most widely used cases of MFI(1) processes are I(1) processes and seasonally (co-)integrated processes for quarterly data with state space unit root structure ((0, d 1 1 ), (π/2, d 2 1 ), (π, d 3 1 )). In general, assuming for notational simplicity ω 1 = 0 and ω l = π, it holds that for t > 0 and x 1,u = 0 we have The above equation provides an additive decomposition of {y t } t∈Z into stochastic trends and cycles, the deterministic and stationary components. The stochastic cycles at frequency 0 < ω k < π are, of course, given by the combination of sine and cosine terms. For the MFI(1) case this can also be seen directly from considering the real valued canonical form discussed in Remark 4, with the matrices A k,R for k = 2, . . . , l − 1, given by A k,R = I d k The ranks of C k B k are equal to the integers d k 1 in Ω S = ((ω 1 , d 1 1 ), . . . , (ω l , d l 1 )). The number of stochastic trends is equal to d 1 1 , the number of stochastic cycles at frequency ω k is equal to 2d k 1 for k = 2, . . . , l − 1 and equal to d l 1 if k = l, as discussed in Section 3. Moreover, in the MFI(1) case, d k 1 is linked to the complex cointegrating rank r k at frequency ω k , defined in Johansen (1991) and Johansen and Schaumburg (1999) in the VECM case as the rank of the matrix Π k := −a(z k ). For VARMA processes with arbitrary integration orders the complex cointegrating rank r k at frequency ω k is r k := rank(−k −1 (z k )), where k(z) is the transfer function, with r k = s − d k 1 in the MFI(1) case. Thus, in the MFI(1) case, determination of the state space unit root structure corresponds to determination of the complex cointegrating ranks in the VECM case.
In the VECM setting, the matrix Π k is usually factorized into Π k = α k β k , as presented for the I(1) case in Section 2. For ω k = {0, π} the column space of β k gives the cointegrating space of the process at frequency ω k . For 0 < ω k < π the relation between the column space of β k and the space of CIVs and PCIVs at the corresponding frequency is more involved. The columns of β k are orthogonal to the columns of C k , the sub-block of C from a state space realization (A, B, C) in canonical form corresponding to the VAR process. Analogously, the column space of the matrix α k , containing the so-called adjustment coefficients, is orthogonal to the row space of the sub-block B k of B.
Both integers d k 1 and r k are related to the dimensions of the static and dynamic cointegrating spaces in the MFI(1) case: For ω k ∈ {0, π}, the cointegrating rank r k = s − d k 1 coincides with the dimension of the static cointegrating space at frequency ω k . Furthermore, the dimension of the static cointegrating space at frequency 0 < ω k < π is bounded from above by r k = s − d k 1 , since it is spanned by at most s − d k 1 vectors β ∈ R s orthogonal to the complex valued matrix C k . The dimension of the dynamic cointegrating space at 0 < ω k < π is equal to 2r k = 2(s − d k 1 ). Identifying again β(z) = β 0 + β 1 z with the vector [β 0 , β 1 ] , a basis of the dynamic cointegrating space at 0 < ω k < π is then given by the column space of the product γ 0γ0 γ 1γ1 := I s 0 s×s − cos(ω k )I s sin(ω k )I s with the columns of β k ∈ C s×(s−d k 1 ) spanning the orthogonal complement of the column space of C k , i.e., β k is of full rank and β k C k = (R(β k ) − iI (β k ) )C k = 0. This holds true, since both factors are of full rank and [γ 0 , γ 1 ] satisfies (z k γ 0 + γ 1 )C k = 0, which corresponds to the necessary condition given in Example 2 for the columns of [γ 0 , γ 1 ] to be PCIVs. The latter implies (z kγ 0 +γ 1 )C k = 0 also for [γ 0 ,γ 1 ] , highlighting again the additional structure of the cointegrating space emanating from the complex conjugate pairs or eigenvalues (and matrices) as discussed in Example 2.
Please note that the relations between r k and d k 1 discussed above only hold in the MFI(1) and I(1) special cases. For higher orders of integration no such simple relations exist.
In the MFI(1) setting the deterministic component typically includes a constant, seasonal dummies and a linear trend. As discussed in Remark 6, a sufficiently rich set of deterministic components allows to absorb non-zero initial values x 1,u .

Testing Hypotheses on the State Space Unit Root Structure
Using the generic sets of transfer functions M Γ g presented in Theorem 4, we can construct pseudo likelihood ratio tests for different hypotheses H 0 : (Ω S , n • ) = (Ω S,0 , n •,0 ) against chosen alternatives. Note, however, that by the results of Theorem 5 the null hypothesis includes all pairs (Ω S , n • ) ∈ A(Ω S,0 , n •,0 ) as well as all pairs (Ω S , n • ) that are smaller than a pair (Ω S ,ñ • ) ∈ A(Ω S,0 , n •,0 ).
As common in the VECM setting, first consider hypotheses at a single frequency ω k . For an MFI(1) process, the hypothesis of a state space unit root structure equal to Ω S,0 = ((ω k , d k 1,0 )) corresponds to the hypothesis of the (compex) cointegrating rank r k at frequency ω k being equal to r 0 = s − d k 1,0 . Maximization of the pseudo likelihood function over the set M(((ω k , d k 1,0 )), n − δ k d k 1,0 ) -with a suitably chosen order n-leads to estimates that may be arbitrary close to transfer functions with different state space unit root structures Ω S . These include Ω S with additional unit root frequencies ω˜k, with the integers d˜k 1 restricted only by the order n. Therefore, focusing on a single frequency ω k does not rule out a more complicated true state space unit root structure. Assume n ≥ δ k s with δ k = 1 for ω k ∈ {0, π} and δ k = 2 else. Corollary 4 shows that M({}, n) ⊃ M(((ω k , 1)), n − δ k ) ⊃ · · · ⊃ M(((ω k , s)), n − sδ k ) since, e.g., (((ω k , 1)), n − δ k ) ∈ A({}, n).
Analogously to the procedure of testing for the complex cointegrating rank r k in the VECM setting, these inclusions can be employed to test for d k 1 : Start with the hypothesis of d k 1 = s against the alternative of 0 ≤ d k 1 < s and decrease the assumed d k 1 consecutively until the test does not reject the null hypothesis.
Furthermore, one can formulate hypotheses on d k 1 jointly at different frequencies ω k . Again, there exist inclusions based on the definition of the set of attainable state space unit root structures and Corollary 4, which can be used to consecutively test hypotheses on Ω S . Johansen (1995) considers in the I(1) case three types of hypotheses on the cointegrating space spanned by the columns of β that are each motivated by examples from economic research: The different cases correspond to different types of hypotheses related to restrictions implied by economic theory.

(i)
H 0 : β = H ϕ, β ∈ R s×r , H ∈ R s×t , ϕ ∈ R t×r , r ≤ t < s: The cointegrating space is known to be a subspace of the column space of H (which is of full column rank).
As discussed in Example 1, cointegration at ω k = 0 occurs if and only if a vector β j satisfies β j C 1 = 0. In other words, the column space of C 1 is the orthocomplement of the cointegrating space spanned by the columns of β and hypotheses on β restrict entries of C 1 .
The first type of hypothesis, H 0 , implies that the column space of C 1 is equal to the orthocomplement of the column space of H ϕ. Assume w.l.o.g. H ∈ O s,t , ϕ ⊥ ∈ O t,t−r and H ⊥ ∈ O s,s−t , such that the columns of [Hϕ ⊥ , H ⊥ ] form an orthonormal basis for the orthocomplement of the cointegrating space. Consider now the mapping: as in Lemma 1. From this one can derive a parameterization of the set of matrices C r 1 corresponding to H 0 , analogously to Lemma 1. The difference of the number of free parameters under the null hypothesis and under the alternative is the difference between the number of free parameters in θ L ∈ [0, 2π) r(s−r) anď θ L ∈ [0, 2π) r(t−r) , implying a reduction of the number of free parameters of r(s − t) under the null hypothesis. This necessarily coincides with the number of degrees of freedom of the corresponding test statistic in the VECM setting (cf. Johansen 1995, Theorem 7.2).
The second type of hypothesis, H 0 , is also straightforwardly parameterized: In this case a subspace of the cointegrating space is known and given by the column space of b ∈ R s×t . Assume w.l.o.g. b ∈ O s,t . The orthocomplement of β = [b, ϕ] is given by the set of matrices C 1 satisfying the restriction b C 1 = 0, i.e., the set O s,d 1 (b) defined in (13). The parameterization of this set has already been discussed. The reduction of the number of free parameters under the null hypothesis is t(s − r) which again coincides with the number of degrees of freedom of the corresponding test statistic in the VECM setting (cf. Johansen 1995, Theorem 7.3).
Finally, the third type of hypothesis, H 0 , is the most difficult to parameterize in our setting. As an illustrative example consider the case H 0 : β = [H 1 ϕ 1 , H 2 ϕ 2 ], β ∈ R s×r , H 1 ∈ R s×t 1 , H 2 ∈ R s×t 2 , ϕ 1 ∈ R t 1 ×r 1 , ϕ 2 ∈ R t 2 ×r 2 , r j ≤ t j ≤ s and r 1 + r 2 = r. W.l.o.g. choose H b ∈ O s,t b such that its columns span the t b -dimensional intersection of the column spaces of H 1 and H 2 and chooseH j ∈ O s,t j (H b ), j = 1, 2 such that the columns ofH j and H b span the column space of H j . DefineH := [H 1 ,H 2 , H b ] ∈ O s,t , witht =t 1 +t 2 + t b . Let w.l.o.g.H ⊥ ∈ O s,s−t (H) and define p j := min(r j ,t j ), q j := max(r j ,t j ) for j = 1, 2 and p b = q 1 −t 1 + q 2 −t 2 . A parameterization of β r ∈ O s,r satisfying the restrictions under the null hypothesis can be derived from the following mapping: where R R (θ R,β ) ∈ R r×r as in Lemma 1 and R H (θ H ) ∈ R˜t ×t is a product of Givens rotations corresponding to the entries in the blocks highlighted by bold font. The three matrices are defined as follows: Consequently, a parameterization of the orthocomplement of the cointegrating space is based on the mapping: where R H (θ H ) ∈ R˜t ×t as above and R R (θ R,C ) ∈ R (s−r)×(s−r) as in Lemma 1. Please note that for all θ H , θ R,β and θ R,C it holds that β r (θ H , θ R,β ) C r 1 (θ H , θ R,C ) = 0 r×(s−r) . The number of parameters restricted under H 0 is equal to r 1 (q 1 − r 1 ) + r 2 (q 2 − r 2 ) + (r 1 + r 2 )(t − q 1 − q 2 ) + (s − r)(s − r + 1)/2, and thus, through q 1 and q 2 , depends on the dimension t b of the intersection of the columns spaces of H 1 and H 2 . The reduction of the number of free parameters matches the degrees of freedom of the test statistics in Johansen (1995, Theorem 7.5), if β is identified, which is the case if r 1 ≤t 1 and r 2 ≤t 2 .
Using the mapping β r (·) as a basis for a parameterization allows to introduce another type of hypotheses of the form: for j = 1, . . . , c such that ∑ c j=1 r j = s − r. The ortho-complement of the cointegrating space is contained in the column spaces of the (full rank) matrices H k .
This type of hypothesis allows, e.g., to test for the presence of cross-unit cointegrating relations (cf. Wagner and Hlouskova 2009, Definition 1) in, e.g., multi-country data sets.
Hypotheses on the cointegrating space at frequency ω k = π can be treated analogously to hypotheses on the cointegrating space at frequency ω k = 0.
Testing hypotheses on cointegrating spaces at frequencies 0 < ω k < π has to be discussed in more detail, as one also has to consider the space spanned by PCIVs, compare Example 2. There are 2(s − d k 1 ) linearly independent PCIVs of the form β(z) = β 0 + β 1 z. Every PCIV corresponds to a vector z k β 0 + β 1 ∈ C s orthogonal to C k and consequently hypotheses on the space spanned by PCIVs can be transformed to hypotheses on the complex column space of C k ∈ C s×d k 1 . Consider, e.g., an extension of the first type of hypothesis of the form withH 0 ,H 1 ∈ R s×t ,φ 0 ,φ 1 ∈ R t×r , r ≤ t < s, which implies that the column space of C k is equal to the orthocomplement of the column space of (H 0 + iH 1 )(φ 0 + iφ 1 ). This general hypothesis encompasses, e.g., the hypothesis [γ 0 , γ 1 ] = Hφ = [H 0 , H 1 ] φ, with H ∈ R 2s×t , H 0 , H 1 ∈ R s×t , φ ∈ R t×r , by setting φ 0 :=φ 1 :=φ,H 0 := H 0 andH 1 := −(cos(ω k )H 0 + H 1 )/ sin(ω k ). The extension is tailored to include the pairwise structure of PCIVs and to simplify transformation into hypotheses on the complex matrix C k used in the parameterization. The parameterization of the set of matrices corresponding to H k 0 is derived from a mapping of the form given in (15), withŘ L (θ L ) and R R (θ R ) replaced by Similarly, the three other types of hypotheses on the cointegrating spaces considered above can be extended to hypotheses on the space of PCIVs in the MFI(1) case. They translate into hypotheses on complex valued matrices β k orthogonal to C k . To parameterize the set of matrices restricted according to these null hypotheses, Lemma 2 is used. Thus, the restrictions implied by the extensions of all four types of hypotheses to hypotheses on the dynamic cointegrating spaces at frequencies 0 < ω k < π for MFI(1) processes can be implemented using Givens rotations.
A different case of interest is the hypothesis of at least m linearly independent CIVs b j ∈ R s , j = 1, . . . , m with 0 < m ≤ s − d k 1 , i.e., an m-dimensional static cointegrating space at frequency 0 < ω k < π, which we discuss as another illustrative example to the procedure for the case of cointegration at complex unit roots.
For the dynamic cointegrating space, this hypothesis implies the existence of 2m linearly independent PCIVs of the form β 1 (z) = b j and β 2 (z) = b j z, j = 1, . . . , m. In light of the discussion above the necessary condition for these two polynomials to be PCIVs is equivalent to b j C k = 0, for j = 1, . . . , m. This restriction is similar to H 0 discussed above, except for the fact that the cointegrating vectors b j are not fully specified. This hypothesis is equivalent to the existence of an m-dimensional real kernel of C k . A suitable parameterization is derived from the following mapping The hypotheses can also be tested jointly for the cointegrating spaces of several unit roots.

Testing Hypotheses on the Adjustment Coefficients
As in the case of hypotheses on the cointegrating spaces β k , hypotheses on the adjustment coefficients α k are typically formulated as hypotheses on the column spaces of α k . We only focus on hypotheses on the real valued α 1 corresponding to frequency zero. Analogous hypotheses may be considered for α k at frequencies ω k = 0, using the same ideas.
The first type of hypothesis on α 1 is of the form H α : α 1 = Aψ, A ∈ R s×t , ψ ∈ R t×r and therefore, can be rewritten as B 1 Aψ = 0. W.l.o.g. let A ∈ O s,t and A ⊥ ∈ O s,s−t . We deal with this type of hypothesis as with H 0 : β = H ϕ in the previous section by simply reversing the roles of C 1 and B 1 . We, therefore, consider the set of feasible matrices B 1 as a subset in O s,s−r and use the mapping to derive a parameterization, while C 1 is restricted to be a p.u.t. matrix and the set of feasible matrices C 1 is parameterized accordingly.
As a second type of hypothesis Juselius (2006, sct. 11.9, p. 200) discusses H α : α 1,⊥ = Hψ, H ∈ R s×t , ψ ∈ R t×(s−r) , linked to the absence of permanent effects of shocks H ⊥ ε t on any of the variables of the system. Assume w.l.o.g. H ⊥ ∈ O s,s−t . Using the parameterization of O s−r (H ⊥ ) defined in (13) for the set of feasible matrices B 1 and the parameterization of the set of p.u.t. matrices for the set of feasible matrices C 1 , implements this restriction.
The restrictions on H α reduce the number of free parameters by r(s − t) and the restrictions implied by H α lead to a reduction by t(s − r) free parameters, compared to the unrestricted case, which matches in both cases the number of degrees of freedom of the corresponding test statistic in the VECM framework.

Restrictions on the Deterministic Components
Including an unrestricted constant in the VECM equation ∆ 0 y t = ε t + Φ 0 leads to a linear trend in the solution process y t = ∑ t j=1 (ε j + Φ 0 ) + y 1 = ∑ t j=1 ε j + y 1 + Φ 0 t, for t > 1. If one restricts the constant to Φ 0 = αΦ 0 ,Φ 0 ∈ R r in a general VECM equation as given in (4), with Π = αβ of rank r, no summation to linear trends in the solution process occurs, while a constant non-zero mean is still present in the cointegrating relations, i.e., the process {β y t } t∈Z . Analogously an unrestricted linear trend Φ 1 t in the VECM equation leads to a quadratic trend of the form Φ 1 t(t − 1)/2 in the solution process, which is excluded by the restriction Φ 1 t = αΦ 1 t.
In the VECM framework, compare Johansen (1995, sct. 5.7, p. 81), five restrictions related to the coefficients corresponding to the constant and the linear trend are commonly considered: with Φ 0 , Φ 1 ∈ R s andΦ 0 ,Φ 1 , ∈ R r and the following consequences for the solution processes: Under H(r) the solution process contains a quadratic trend in the direction of the common trends, i.e., in {β ⊥ y t } t∈Z , and a linear trend in the direction of the cointegrating relations, i.e., in {β y t } t∈Z . Under H * (r) the quadratic trend is not present. H 1 (r) features a linear trend only in the directions of the common trends, H 2 (r) a constant only in these directions. Under H * 1 (r) the constant is also present in the directions of the cointegrating relations.
In the state space framework the deterministic components can be added in the output equation y t = Cx t + Φd t + ε t , compare (9). Consequently, the above considered hypotheses can be imposed by formulating linear restrictions on Φ. These can be directly parameterized by including the following deterministic components in the five considered cases: The component C 1Φ0 captures the influence of the initial value C 1 x 1,1 in the output equation.
In the VECM framework for the seasonal MFI(1) case, with Π k = α k β k of rank r k for 0 < ω k < π, the deterministic component usually includes restricted seasonal dummies of the form α kΦk z t k + α kΦk (z k ) t ,Φ k ∈ C r k to avoid summation in the directions of the stochastic trends. The state space framework allows to straightforwardly include seasonal dummies in the output equation in the form of Φ k z t k + Φ k (z k ) t , Φ k ∈ C s . Again, it is of interest whether these components are unrestricted or whether they take the form of C kΦk z t k + C kΦk (z k ) t ,Φ k ∈ C d k 1 , similarly allowing for a reinterpretation of these components as influence of the initial values x 1,k on the output.
Please note that Φ k z t k + Φ k (z k ) t is equivalently given byΦ k,1 sin(ω k t) +Φ k,2 cos(ω k t) using real coefficientsΦ k,1 ,Φ k,2 ∈ R s and the desired restrictions can be implemented accordingly.

The I(2) Case
The state space unit root structure of I(2) processes is of the form Ω S = ((0, d 1 1 , d 1 2 )), where the integer d 1 1 equals the dimension of x E t,1 , and d 1 2 equals the dimension of [(x G t,2 ) , (x E t,2 ) ] . Recall that the solution for t > 0 and x 1,u = 0 of the system in canonical form in this setting is given by For VAR processes integrated of order two the integers d 1 1 and d 1 2 of the corresponding state space unit root structure are linked to the ranks of the matrices Π = αβ (denoted as r = r 0 ) and α ⊥ Γβ ⊥ = ξη (denoted as m = r 1 ) in the VECM setting, as discussed in Section 2. It holds that r = s − d 1 2 and m = d 1 2 − d 1 1 . The relation of the state space unit root structure to the cointegration indices r 0 , r 1 , r 2 was also discussed in Section 3.
Again, both the integers d 1 1 and d 1 2 and the ranks r and m, and consequently also the indices r 0 , r 1 and r 2 , are closely related to the dimensions of the spaces spanned by CIVs and PCIVs. In the I(2) case the static cointegrating space of order ((0, 2), (0, 1)) is the orthocomplement of the column space of C E 1,1 and thus of dimension s − d 1 1 . The dimension of the space spanned by CIVs of order ((0, 2), {}) is equal to s − d 1 2 − r c,G , where r c,G denotes the rank of C G 1,2 , since this space is the orthocomplement of the column space of [C E 1,1 , C G 1,2 , C E 1,2 ]. The space spanned by the PCIVs β 0 + β 1 z of order ((0, 2), {}) is of dimension smaller or equal to 2s − d 1 1 − d 1 2 , due to the orthogonality constraint on [β 0 , β 1 ] given in Example 3.
Consider the matrices β,β 1 and β 2 as defined in Section 2. From a state space realization (A, B, C) in canonical form corresponding to a VAR process it immediately follows that the columns of β 2 span the same space as the columns of the sub-block C E 1,1 . The same relation holds true for β 1 and the sub-block C E 1,2 . With respect to polynomial cointegration, Bauer and Wagner (2012) show that the rank of C G 1,2 determines the number of minimum degree polynomial cointegrating relations, as discussed in Example 3. If C G 1,2 = 0, then there exists no vector γ, such that {γ y t } t∈Z is integrated and cointegrated with {β 2 ∆ 0 y t } t∈Z . In this case {β y t } t∈Z is a stationary process. The deterministic components included in the I(2) setting are typically a constant and a linear trend. As in the MFI(1) case, identifiability problems occur, if we consider a non-zero initial state x 1,u : The solution to the state space equations for t > 0 and x 1,u = 0 is given by: Again, this implies non-identifiability, which is resolved by assuming x 1,u = 0, compare Remark 6.

Testing Hypotheses on the State Space Unit Root Structure
To simplify notation we use Here M(d 1 1 , d 1 2 ) for d 1 1 + d 1 2 > 0 denotes the closure of the set of transfer functions of order n that possess a state space unit root structure of either Ω S = ((0, d 1 1 , d 1 2 )) or Ω S = ((0, d 1 2 )) in case of d 1 1 = 0, while M(0, 0) denotes the closure of the set of all stable transfer functions of order n. Considering the relations between the different sets of transfer functions given in Corollary 4 shows that the following relations hold (assuming s ≥ 4; the columns are arranged to include transfer functions with the same dimension of A u ): Johansen (1995). Therefore, the relationships between the subsets match the ones in Johansen (1995, Table 9.1) and the ones found by Jensen (2013). The latter type of inclusions appear for instance for M(0, 2), containing transfer functions corresponding to I(1) processes, which is a subset of the set M(1, 0) of transfer functions corresponding to I(2) processes.
The same remarks as in the MFI(1) case also apply in the I(2) case: When testing for H 0 : Ω S = ((0, d 1 1,0 , d 1 2,0 )), all attainable state space unit root structures A(((0, d 1 1,0 , d 1 2,0 ))) have to be included in the null hypothesis. Johansen (2006) discusses several types of hypotheses on the cointegrating spaces of different orders. These deal with properties of β, joint properties of [β, β 1 ] or the occurrence of non-trivial polynomial cointegrating relations. Boswijk and Paruolo (2017), moreover, discuss testing hypotheses on the loading matrices of common trends (corresponding in our setting to testing hypotheses on C 1 ).

Testing Hypotheses on CIVs and PCIVs
We commence with hypotheses of the form H 0 : β = Kϕ and H 0 : β = [b, ϕ] just as in the MFI(1) case at unit root one, since hypotheses on β correspond to hypotheses on its orthocomplement spanned by [C E 1,1 , C E 1,2 ] in the VARMA framework: Hypotheses of the form H 0 : β = Kϕ, K ∈ R s×t , ϕ ∈ R t×r imply ϕ K [C E 1,1 , C E 1,2 ] = 0. W.l.o.g. let K ∈ O s,t and K ⊥ ∈ O s,s−t . As in the parameterization under H 0 in the MFI(1) case at unit root one, compare (15), use the mapping  The hypothesis of no minimum degree polynomial cointegrating relations implies the restriction C G 1,2 = 0, compare Example 3. Therefore, we can test all hypotheses considered in Johansen (2006) also in our more general setting.

Testing Hypotheses on the Adjustment Coefficients
Hypotheses on α and ξ as defined in (6) and (7) correspond to hypotheses on the spaces spanned by the rows of B 1,2,1 and B 1,2,2 . For VAR processes integrated of order two, the row space of B 1,2,1 is equal to the orthogonal complement of the column space of [α, α ⊥ ξ], while the row space of B 1,2 := [B 1,2,1 , B 1,2,2 ] is equal to the orthogonal complement of the column space of α. The restrictions corresponding to hypotheses on α and ξ can be implemented analogously to the restrictions corresponding to hypotheses on α 1 in Section 5.1.3, reversing the roles of the relevant sub-blocks in B u and C u accordingly.

Restrictions on the Deterministic Components
The I(2) case is, with respect to the modeling of deterministic components, less well studied than the MFI(1) case. In most theory papers they are simply left out, with the notable exception Rahbek et al. (1999), dealing with the inclusion of a constant term in the I(2)-VECM representation. The main reason for this appears to be the way deterministic components in the defining vector error correction representation translate into deterministic components in the corresponding solution process. An unrestricted constant in the VECM for I(2) processes leads to a linear trend in {β 1 y t } t∈Z and a quadratic trend in {β 2 y t } t∈Z , while an unrestricted linear trend results in quadratic and cubic trends in the respective directions. Already in the I(1) case discussed above five different cases-with respect to integration and asymptotic behavior of estimators and tests-need to be considered separately. An all encompassing discussion of the restrictions on the coefficients of a constant and a linear trend in the I(2) case requires the specification of even more cases. As an alternative approach in the VECM framework, deterministic components could be dealt with by replacing y t with y t − Φd t in the VECM equation. This has recently been considered in Johansen and Nielsen (2018) and is analogous to our approach in the state space framework.
As before, in the MFI(1) or I(1) case, the analysis of (the impact of) deterministic components is straightforward in the state space framework, which effectively stems from their additive inclusion in the Granger-type representation, compare (9). Choose, e.g., Φd t = Φ 0 + Φ 1 t, as in the I(1) case. In analogy to Section 5.1.4, linear restrictions of deterministic components in relation to the static and polynomial cointegrating spaces can be embedded in a parameterization. Focusing on Φ 0 , e.g., this is achieved by where the columns ofC 1,2 are a basis for the column space of C G 1,2 , which does not necessarily have full column rank, and the columns of C ⊥ span the orthocomplement of the column space of [C E 1,1 , C E 1,2 ,C 1,2 ]. The matrix Φ 1 can be decomposed analogously. The corresponding parametrization then allows to consider different restricted versions of deterministic components and to study the asymptotic behavior of estimators and tests for these cases.

Summary and Conclusions
Vector autoregressive moving average (VARMA) processes, which can be cast equivalently in the state space framework, may be useful for empirical analysis compared to the more restrictive class of vector autoregressive (VAR) processes for a variety of reasons. These include invariance with respect to marginalization and aggregation, parsimony as well as the fact that the log-linearized solutions to DSGE models are typically VARMA processes rather than VAR processes. To realize the potential of these advantages necessitates, in our view, to develop cointegration analysis for VARMA processes to a similar extent as it is developed for VAR processes. The necessary first steps of this research agenda are to develop a set of structure theoretical results that allow subsequently developing statistical inference procedures. Bauer and Wagner (2012) provides the very first step of this agenda by providing a canonical form for unit root processes in the state space framework, which is shown in that paper to be very convenient for cointegration analysis.
Based on the earlier canonical form paper this paper derives a state space model parameterization for VARMA processes with unit roots using the state space framework. The canonical form and a fortiori the parameterization based on it are constructed to facilitate the investigation of the unit root and (static and polynomial) cointegration properties of the considered process. Furthermore, the paper shows that the framework allows to test a large variety of hypotheses on cointegrating ranks and spaces, clearly a key aspect for the usefulness of any method to analyze cointegration. In addition to providing general results, throughout the paper all results are discussed in detail for the multiple frequency I(1) and I(2) cases, which cover the vast majority of applications.
Given the fact that (as shown in Hazewinkel and Kalman 1976) VARMA unit root processes cannot be continuously parameterized, the set of all unit root processes (as defined in this paper) is partitioned according to a multi-index Γ that includes the state space unit root structure. The parameterization is shown to be a diffeomorphism on the interior of the considered sets. The topological relationships between the sets forming the partitioning of all transfer functions considered are studied in great detail for three reasons: First, pseudo maximum likelihood estimation effectively amounts to maximizing the pseudo likelihood function over the closures of sets of transfer functions, M Γ in our notation. Second, related to the first item, the relations between subsets of M Γ have to be understood in detail as knowledge concerning these relations is required for developing (sequential) pseudo likelihood-ratio tests for the numbers of stochastic trends or cycles. Third, of particular importance for the implementation of, e.g., pseudo maximum likelihood estimators, we discuss the existence of generic pieces.
In this respect we derive two results: First, for correctly specified state space unit root structure and system order of the stable subsystem -and thus correctly specified system order-we explicitly describe generic indices Γ g (Ω S , n • ) such that M Γ g (Ω S ,n • ) is open and dense in the set of all transfer functions with state space unit root structure Ω S and system order of the stable subsystem n • . This result forms the basis for establishing consistent estimators of the transfer functions-and via continuity of the parameterization-of the parameter estimators when the state space unit root structure and system order are known. Second, in case only an upper bound on the system order is known (or specified), we show the existence of a generic multi-index Γ α •,g (n) for which the set of corresponding transfer functions M Γ α•,g (n) is open and dense in the set M n of all non-explosive transfer functions whose order (or McMillan degree) is bounded by n. This result is the basis for consistent estimation (on an open and dense subset) when only an upper bound of the system order is known. In turn this estimator is the starting point for determining Ω S , using the subset relationships alluded to above in the second point. For the MFI(1) and I(2) cases we show in detail that similar subset relations (concerning cointegrating ranks) as in the cointegrated VAR MFI(1) and I(2) cases hold, which suggests constructing similar sequential test procedures for determining the cointegrating ranks as in the VAR cointegration literature.
Section 5 is devoted to a detailed discussion of testing hypotheses on the cointegrating spaces, again for both the MFI(1) and the I(2) case. In this section, particular emphasis is put on modeling deterministic components. The discussion details how all usually formulated and tested hypotheses concerning (static and polynomial) cointegrating vectors, potentially in combination with (un-)restricted deterministic components, in the VAR framework can also be investigated in the state space framework.
Altogether, the paper sets the stage to develop pseudo maximum likelihood estimators, investigate their asymptotic properties (consistency and limiting distributions) and tests based on them for determining cointegrating ranks that allow performing cointegration analysis for cointegrated VARMA processes. The detailed discussion of the MFI(1) and I(2) cases benefits the development of statistical theory dealing with these cases undertaken in a series of companion papers. Let C j be a sequence in O s,d converging to C 0 for j → ∞. By continuity of matrix multiplication is a product of matrices whose elements are either constant or infinitely often differentiable functions of the elements of θ. (iii) The algorithm discussed above Lemma 1 maps every C ∈ O s,d to [I d , 0 s−d×d ] . Since R q,i,j (θ) −1 = R q,i,j (θ) for all q, i, j and θ, C can be obtained by multiplying [I d , 0 s−d×d ] with the transposed Givens rotations. (iv) As discussed, C −1 O (·) is obtained from a repeated application of the algorithm described in Remark 10. In each step two entries are transformed to polar coordinates. According to Amann and Escher (2008, chp. 8, p. 204) the transformation to polar coordinates is infinitely often differentiable with infinitely often differentiable inverse for θ > 0 (and hence r > 0), i.e., on the interior of the interval [0, π).  (i) Let C j be a sequence in U s,d converging to C 0 for j → ∞. By continuity of matrix multiplication Thus, C 0 ∈ U s,d , which shows that U s,d is closed. By construction [C C] i,i = ∑ s j=1 |c j,i | 2 . Since [C C] i,i = 1 for all C ∈ U s,d and i = 1, . . . , d, the entries of C are bounded. (ii) By definition C U (ϕ) is a product of matrices whose elements are either constant or infinitely often differentiable functions of the elements of ϕ. (iii) The algorithm discussed above Lemma 2 maps every C ∈ U s,d to [D d (ϕ D ), 0 s−d×d ] with D d (ϕ D ) = diag(e iϕ D,1 , . . . , e iϕ D,d ). Since Q q,i,j (ϕ) −1 = Q q,i,j (ϕ) for all q, i, j and ϕ, C can be obtained by multiplying [D d (ϕ D ), 0 s−d×d ] with the transposed Givens rotations.
(iv) The algorithms in Remark 12 and above Lemma 2 describe C −1 U in detail. The determination of an element of ϕ L or ϕ R uses the transformation of two complex numbers into polar coordinates in step 2 of Remark 12, which according to Amann and Escher (2008, chp. 8, p. 204) is infinitely often differentiable with infinitely often differentiable inverse except for non-negative reals, which are the complement of an open and dense subset of the complex plane.
Step 3 of Remark 12 uses the formulas ϕ 1 = tan −1 b a , which is infinitely often differentiable for a > 0, and ϕ 2 = ϕ a − ϕ b mod 2π, which is infinitely often differentiable for ϕ a = ϕ b , which occurs on an open and dense subset of [0, 2π) × [0, 2π). For the determination of an element of ϕ D a complex number of modulus one is transformed in polar coordinates which is infinitely often differentiable on an open and dense subset of complex numbers of modulus one compare again Amann and Escher (2008, chp. 8, p. 204 The multi-index Γ is unique for a transfer function k ∈ M n , since it only contains information encoded in the canonical form. Therefore, M Γ is well defined. Since conversely for every transfer function k ∈ M n a multi-index Γ can be found, M Γ constitutes a partitioning of M n . Furthermore, using the canonical form, it is straightforward to see that the mapping attaching the triple (A, B, C) ∈ ∆ Γ in canonical form to a transfer function k ∈ M Γ is homeomorphic (bijective, continuous, with continuous inverse): Bijectivity is a consequence of the definition of the canonical form. T pt continuity of the transfer function as a function of the matrix triples is obvious from the definition of T pt . Continuity of the inverse can be shown by constructing the canonical form starting with an overlapping echelon form (which is continuous according to Hannan and Deistler 1988, chp. 2) and subsequently transforming the state basis to reach the canonical form. This involves the calculation of a Jordan normal form with fixed structure. This is an analytic mapping (cf. Chatelin 1993, Theorem 4.4.3). Finally, the restrictions on C and B are imposed. For given multi-index Γ these transformations are continuous (as discussed above they involve QR decompositions to obtain unitary block columns for the blocks of C, rotations to p.u.t form with fixed structure for the blocks of B and transformations to echelon canonical form for the stable part). (ii) The construction of the triple (A(θ), B(θ), C(θ)) for given θ and Γ is straightforward: A u is uniquely determined by Γ. Since θ B,p contains the entries of B u restricted to be positive and θ B, f contains the free parameters of B u , the mapping θ B,p × θ B, f → B u is continuous. The mapping θ • → (A • , B • , C • ) is continuous (cf. Hannan and Deistler 1988, Theorem 2.5.3 (ii)). The mapping θ C,E × θ C,G → C u consists of iterated applications of C O , and C U (compare Lemmas 1 and 2) which are differentiable and thus continuous and iterated applications of the extensions of the mappings C O,d 2 −d 1 and C O,G (compare Corollaries 1 and 2) to general unit root structures and to complex matrices. The proof that these functions are differentiable is analogous to the proofs of Lemma 1 and Lemma 2. (iii) The definitions of θ B, f and θ B,p immediately imply that they depend continuously on B u .
The parameter vector θ • depends continuously on (A • , B • , C • ) (cf. Hannan and Deistler 1988, Theorem 2.5.3 (ii)). The existence of an open and dense subset of matrices C u such that the mapping attaching parameters to the matrices is continuous follows from arguments contained in the proofs of Lemmas 1 and 2.

Appendix B. Proofs of the Results of Section 4
Appendix B.1. Proof of Theorem 3 For the first inclusion the proof can be divided into two parts, discussing the stable and the unstable subsystem separately. The result with regard to the stable subsystem is due to Hannan and Deistler (1988, Theorem 2.5.3 (iv)). For the unstable subsystem (Ω S ,p) ≤ (Ω S , p) implies the existence of a matrix S as described in Definition 9. Partition S = S 1 S 2 such that S 1 p = p 1 ≥p.
Letk be an arbitrary transfer function in MΓ = π(∆Γ) with corresponding state space realization (Ã,B,C) ∈ ∆Γ. Then, we find matrices B 1 and C 1 such that for the state space realization given by A = S Then, (A j , B j , C j ) = (A, S diag(I n 1 , j −1 I n 2 )S B, C) ∈ ∆ Γ , where n i is the number of rows of S i for i = 1, 2 converges for j → ∞ to A, S B 0 , C ∈ ∆ Γ , which is observationally equivalent to (Ã,B,C). Consequently,k = π A, S B 0 , C ∈ π(∆ Γ ).
To show the second inclusion, consider a sequence of systems (A j , B j , C j ) ∈ ∆ Γ , j ∈ N converging to (A 0 , B 0 , C 0 ) ∈ ∆ Γ . We need to showΓ ∈ Γ ∈K(Γ) {Γ ≤Γ}, whereΓ is the multi-index corresponding to (A 0 , B 0 , C 0 ). For the stable system we can separate the subsystem (A j,s , B j,s , C j,s ) remaining stable in the limit and the part with eigenvalues of A j tending to the unit circle. As discussed in Section 4.1.2, (A j,s , B j,s , C j,s ) converges to the stable subsystem (A 0,• , B 0,• , C 0,• ) whose Kronecker indices can only be smaller than or equal to α • (cf. Hannan and Deistler 1988, Theorem 2.5.3).
The remaining subsystem consists of the unstable subsystem of (A j , B j , C j ) which converges to (A 0,u , B 0,u , C 0,u ) and the second part of the stable subsystem containing all stable eigenvalues of A j converging to the unit circle. The limiting combined subsystem (A 0,c , B 0,c , C 0,c ) is such that A 0,c is block diagonal. If the limiting combined subsystem is minimal and B 0,u has a structure corresponding to p, this shows that the pair (Ω S ,p) extends (Ω S , p) in accordance with the definition of K(Γ).
Since the limiting subsystem is not necessarily minimal and B 0,u has not necessarily a structure corresponding to p, eliminating coordinates of the state and adapting the corresponding structure indices p may result in a pair (Ω S ,p) that is smaller than the pair (Ω S ,p) corresponding to an element of K(Γ).
all systems of given state space unit root structure correspond to a multi-index that is smaller than or equal to (Ω S , p max , β • ), where β • is a Kronecker index corresponding to state space dimension n • . For the Kronecker indices of order n • it is known that there exists one index α •,g such that M α •,g is open and dense in M n • . The set M Ω S ,p max ,β • is, therefore, contained in M Ω S ,p max ,α •,g which implies (14) with Γ g (Ω S , n • ) := (Ω S , p max , α •,g ).
For the second claim choose an arbitrary state space realization (A, B, C) in canonical form such that π(A, B, C) ∈ M(Ω S , n • ) for arbitrary Ω S . Define the sequence (A j , B j , C j ) j∈N by A j = (1 − j −1 )A, B j = (1 − j −1 )B, C j = C. Then λ |max| (A j ) < 1 holds for all j, which implies π(A j , B j , C j ) ∈ M Γ α•,g (n) for every n ≥ n u (Ω s ) + n • and every j. The continuity of π implies π(A, B, C) = lim j→∞ π(A j , B j , C j ) ∈ M Γ α•,g (n) .
Second,Ǎ u corresponding to state space unit rootΩ S may be of the form (A1) where off-diagonal elements of A u are replaced by zero. To prove (b) we need to show that for both cases the corresponding transfer function is contained in M(Ω S , n • ).
We start by showing that in the second case the transfer functionǩ is contained in M(Ω S ,ñ • ), whereΩ S is the state space unit root structure corresponding toÃ u in (A1). For this, consider the sequence Clearly, every system (A j , B j , C j ) corresponds to an I(2) process, while the limit for j → ∞ corresponds to an I(1) process. This shows that it is possible in the limit to trade one I(2) component with two I(1) components leading to more transfer functions in the T pt closure of M Γ g (Ω S ,n • ) than only the ones included in π(∆ Γ g (Ω S ,n • ) ), where the off-diagonal entry in A j is restricted to equal one and hence the corresponding sequence of systems in the canonical form diverges to infinity. In a sense these systems correspond to "points at infinity": For the example given above we obtain the canonical form Thus, the corresponding parameter vector for the entries in B j,2 converges to zero and the ones corresponding to C j,2 to infinity.
Generalizing this argument shows that every transfer function corresponding to a pair (Ω S ,ň • ) in A(Ω S ,ñ • ), whereǍ u can be obtained by replacing off-diagonal entries of A u with zero, can be reached from within M(Ω S ,ñ • ).
To provek ∈ M(Ω S , n • ) in the first case, where the state space unit root structure is extended as visible in Equation (A1), consider the sequence: corresponding to the following system in canonical form (except that the stable subsystem is not necessarily in echelon canonical form) This sequence shows that there exists a sequence of transfer functions corresponding to I(1) processes with one common trend that converge to a transfer function corresponding to an I(2) system. Again, in the canonical form this cannot happen as there the (1, 2) entry ofÃ j would be restricted to be equal to zero. At the same time note that the dimension of the stable system is reduced due to one component of the state changing from the stable to the unit root part.
Now for a unit root structureΩ S such that (Ω S ,ñ • ) ∈ A(Ω S , n • ), satisfying SÃ u S = A u J 12 0 J 2 , the Jordan blocks corresponding to Ω S are sub-blocks of the ones corresponding toΩ S , potentially involving a reordering of coordinates using the permutation matrix S. Taking as the approximating sequence of transfer functionsk j ∈ M Γ g (Ω S ,n • ) → k 0 ∈ M Γ g (Ω S ,ñ • ) that have the same structureΩ S but replacing J 2 by j−1 j J 2 leads to processes with state space unit root structure Ω S .
For the stable part ofk j we can separate the part containing poles tending to the unit circle (contained in J 2 ) and the remaining transfer functionk j,s , which has Kronecker indicesα ≤ α • . However, the results of Hannan and Deistler (1988, Theorem 2.5.3) then imply that the limit remains in M α • and hence allows for an approximating sequence in M α • .
Both results combined constitute the whole set of attainable state space unit root structures in Definition 11 and prove (b).
As follows from Corollary 4, M(Ω S , n • ) = M Γ g (Ω S ,n • ) . Thus, (b) implies (Ω S ,ñ • )∈A(Ω S ,n • ) M(Ω S ,ñ • ) ⊂ M Γ g (Ω S ,n • ) and (a) adds the second union showing the subset inclusion. It remains to show equality for the last set inclusion. Thus, we need to show that for k j ∈ M Γ g (Ω S ,n • ) , k j → k 0 , it holds that k 0 ∈ M(Ω S ,ñ • ), where (Ω S ,ñ • ) ≤ (Ω S ,ň • ) ∈ A(Ω S , n • ). To this end note that the rank of a matrix is a lower semi-continuous function such that for a sequence of matrices E j with limit E 0 , we have rank( lim j→∞ E j ) = rank(E 0 ) ≤ lim inf j→∞ rank(E j ).
Then, consider a sequence k j (z) ∈ M Γ g (Ω s ,n • ) , j ∈ N. We can find a converging sequence of systems (A j , B j , C j ) realizing k j (z). Therefore, choosing E j = (A j − z k I n ) r we obtain that rank((A 0 − z k I n ) t ) ≤ n − t ∑ r=1 d k j,h k −r+1 , since k j (z) ∈ M Γ g (Ω s ,n • ) implies that the number d k j,h k −r+1 of the generalized eigenvalues at the unit roots is governed by the entries of the state space unit root structure Ω s . This implies that ∑ t r=1 d k j,h k −r+1 ≤ ∑ t r=1 d k 0,h k −r+1 for t = 1, 2, ..., n. Consequently, the limit has at least as many chains of generalized eigenvalues of each maximal length as dictated by the state space unit root structure Ω S for each unit root of the limiting system. Rearranging the rows and columns of the Jordan normal form using a permutation matrix S it is then obvious that either the limiting matrix A 0 has additional eigenvalues, where thus SA 0 S = A jJ12 0J 2 must hold. Or upper diagonal entries in A j must be changed from ones to zeros in order to convert some of the chains to lower order. One example in this respect was given above: For A j = 1 1/j 0 1 the rank of (A j − I 2 ) r is equal to 1 for r = 1 and 0 for r = 2. For the limit we obtain A 0 = I 2 and hence the rank is zero for r = 1, 2. The corresponding indices are d 1 j,1 = 1, d 1 j,2 = 1 for the approximating sequence and d 1 0,1 = 0, d 1 0,2 = 2 for the limit respectively. Summing these indices starting from the last one, one obtains d 1 j,2 = 1 ≤ d 1 0,2 = 2 and d 1 j,1 + d 1 j,2 = 2 ≤ d 1 0,1 + d 1 0,2 = 2. Hence the state space unit root structure corresponding to (A 0 , B 0 , C 0 ) must be attainable according to Definition 11. The number of stable state components must decrease accordingly.
Finally, the limiting system (A 0 , B 0 , C 0 ) is potentially not minimal. In this case the pair (Ω S ,ñ • ) is reduced to a smaller one, concluding the proof.