1. Introduction
The duality between the common trends representation and the vector equilibrium-correction model-form (VECM) in cointegrated systems allows researchers to formulate hypotheses of economic interest on any of the two. The VECM is centered on the adjustment with respect to disequilibria in the system; in this way it facilitates the interpretation of cointegrating relations as (deviations from) equilibria.
The common trends representation instead highlights how variables in the system as pushed around by common stochastic trends, which are often interpreted as the main persistent economic factors influencing the long-term. Both representations provide economic insights on the economic system under scrutiny. Examples of both perspectives are given in 
Juselius (
2017a, 
2017b)
The common trends and VECM representations are connected through representation results such as the Granger Representation Theorem, in the case of I(1) systems, see 
Engle and Granger (
1987) and 
Johansen (
1991), and the Johansen Representation Theorem, for the case of I(2) systems, see 
Johansen (
1992). In particular, both representation theorems show that the loading matrix of the common stochastic trends of highest order is a basis of the orthogonal complement of the matrix of cointegrating relations. Because of this property, these two matrices are linked, and any one of them can be written as a function of the other one.
This paper focuses on I(2) vector autoregressive (VAR) systems, and it considers the situation where (possibly over-identifying) economic hypotheses are entertained for the factor loading matrix of the I(2) trends. It is shown how they can then be translated into hypotheses on the cointegrating relations, which appear in the VECM representation; the latter forms the basis for maximum likelihood (ML) estimation of I(2) VAR models. In this way, constrained ML estimators are obtained and the associated likelihood ratio (LR) tests of these hypotheses can be defined. These tests are discussed in the present paper; Wald tests on just-identified loading matrices of the I(1) and I(2) common trends have already been proposed by 
Paruolo (
1997, 
2002).
The running example of the paper is taken from 
Juselius and Assenmacher (
2015), which is the working paper version of 
Juselius and Assenmacher (
2017). The following notation is used: for a full column-rank matrix 
a, 
 denotes the space spanned by the columns of 
a and 
 indicates a basis of the orthogonal complement of 
. For a matrix 
b of the same dimensions of 
a, and for which 
 is full rank, let 
; a special case is when 
, for which 
. Let also 
 indicate the orthogonal projection matrix onto 
, and let the matrix 
 denote the orthogonal projection matrix on its orthogonal complement. Finally 
 is used to indicate the 
j-th column of an identity matrix of appropriate dimension.
The rest of this paper is organized as follows: 
Section 2 contains the motivation and the definition of the problem considered in the paper. The identification of the I(2) common trends loading matrix under linear restrictions is analysed in 
Section 3. The relationship between the identified parametrization of I(2) common trends loading matrix and an identified version of the cointegration matrix is also discussed. 
Section 4 considers a parametrization of the VECM, and discusses its identification. ML estimation of this model is discussed in 
Section 5; the asymptotic distributions of the resulting ML estimator of the I(2) loading matrix and the LR statistic of the over-identifying restrictions are sketched in 
Section 6. 
Section 7 reports an illustration of the techniques developed in the paper on a system of US and Swiss prices, interest rates and exchange rate. 
Section 8 concludes, while two appendices report additional technical material.
  2. Common Trends Representation for I(2) Systems
This section introduces quantities of interest and presents the motivation of the paper. Consider a 
p-variate VAR(
k) process 
:
      where 
 are 
 matrices, 
 and 
 are 
 vectors, and 
 is a 
 i.i.d. 
 vector, with 
 positive definite. Under the conditions of the Johansen Representation Theorem, see 
Appendix A, called 
the I(2) 
conditions, 
 admits a common trends I(2) representation of the form
      
      where 
 are the I(2) stochastic trends (cumulated random walks), 
 is a random walk component, and 
 is an I(0) linear process.
Cointegration occurs when the matrix 
 has reduced rank 
, such that 
, where 
a and 
b are 
 and of full column rank. This observation lends itself to the following interpretation: 
 defines the 
 common I(2) trends, while 
a acts as the loading matrix of 
 on the I(2) trends. The reduced rank of 
 implies that there exist 
 linearly independent cointegrating vectors, collected in a 
 matrix 
, satisfying 
; hence 
 is I(1). Combining this with 
, it is clear that 
, i.e., the columns of the loading matrix span the orthogonal complement of the cointegration space 
. Interest in this paper is on hypotheses on 
.
1Observe that 
 is invariant to the choice of basis of either 
 and 
. In fact, 
 can be replaced by 
 with 
Q square and nonsingular without affecting 
. One way to resolve this identification problem is to impose restrictions on the entries of 
; enough restrictions of this kind would make the choice of 
 unique. Such an approach to identification is common in confirmatory factor analysis in the statistics literature, see 
Jöreskog et al. (
2016).
If more restrictions are imposed than needed for identification, they are over-identifying. Such over-identifying restrictions on 
 usually correspond to (similarly over-identifying) restrictions on 
, see 
Section 3 below. Although economic hypotheses may directly imply restrictions on the cointegrating vectors in 
, in some cases it is more natural to formulate restrictions on the I(2) loading matrix 
. This is illustrated by the two following examples.
  2.1. Example 1
Kongsted (
2005) considers a model for 
, where 
, 
 and 
 denote the nominal money stock, nominal income and the price level, respectively (all variables in logs); here ‘:’ indicates horizontal concatenation. He assumes that the system is I(2), with 
. Given the definition of the variables, 
Kongsted (
2005) considers the natural question of whether real money 
 and real income 
 are at most I(1). This corresponds to an (over-identified) cointegrating matrix 
 and loading vector 
 of the form
        
        The form of 
 corresponds to the fact that the I(1) linear combinations 
 are (linear combinations of) 
, as required. On the other hand, the restriction on 
 says that each of the three series have exactly the same I(2) trend, with the same scale factor. Both formulations are easily interpretable.
 Note that the hypothesis on  involves two over-identifying restrictions (the second and third component are equal to the first component), in addition to a normalization (the first component equals 1). Similarly, the restriction that the matrix consisting of the first two rows of  equals  is a normalization; the two over-identifying restrictions are that the entries in both columns sum to 0.
As this first example shows, knowing 
 is the same as knowing 
 and vice versa
2.
  2.2. Example 2
Juselius and Assenmacher (
2015) consider a 7-dimensional VAR with 
 with 
, where 
, 
, 
 are the (log of) the price index, the long and the short interest rate of country 
i at time 
t respectively, and 
 is the log of the exchange rate between country 1 (Switzerland) and 2 (the US) at time 
t. They expect the common trends representation to have a loading matrix 
 of the form:
        where 
 indicates an entry not restricted to 0.
 The second I(2) trend is loaded on the interest rates , , , , as well as on US prices  and the exchange rate ; this can be interpreted as a financial (or ‘speculative’) trend affecting world prices. The first I(2) trend, instead, is only loaded on , ,  and embodies a ‘relative price’ I(2) trend; it can be interpreted as the Swiss contribution to the trend in prices.
The cointegrating matrix 
 in this example is of dimension 
. It is not obvious what type of restrictions on 
 correspond to the structure in (
3). However, it is 
 rather than 
 that enters the likelihood function (as will be analyzed in 
Section 4). The rest of the paper shows that the restrictions in (
3) are over-identifying, how they can be translated into hypotheses on 
, and how they can be tested via LR tests.
  3. Hypothesis on the Common Trends Loadings
This section discusses linear hypotheses on 
 and their relation to 
. First, attention is focused on the case of linear hypotheses on the normalized version 
 of 
. Here 
 is a full-column-rank matrix of the same dimension of 
 such that 
 is square and nonsingular
3. This normalization was introduced by 
Johansen (
1991) in the context of the I(1) model in order to isolate the (just-) identified parameters in the cointegration matrix.
Later, linear hypotheses formulated directly on  are discussed. The main result of this section is the fact that the parameters of interest appears linearly both in  and in  in the first case; this is not necessarily true in the second case.
The central relation employed in this section (for both cases), is the following identity:
      where 
. This identity readily follows from the oblique projections identity
      see e.g. 
Srivastava and Kathri (
1979, p. 19), by post-multiplication by 
  3.1. Linear hypotheses on 
Johansen (
1991) noted that the function 
 is invariant with respect to the choice of basis of the space spanned by 
a. in fact, consider in the present context any alternative basis 
 of the space spanned by 
; this has representation 
 for 
Q square and full rank. Inserting 
 in place of 
 in the definition of 
, one finds
        
        Hence 
, similarly to the cointegration matrix in the I(1) model in 
Johansen (
1991), is (just-)identified.
 To facilitate stating hypotheses on the unconstrained elements of 
, the following representation of 
 appears useful:
        where 
 is an 
 matrix of free coefficients in 
4. For example, one may have
        
        with 
, 
, 
.
Consider over-identifying linear restrictions on the columns of 
 in (
5). Typically, such restrictions will come in the form of zero (exclusion) restrictions or unit restrictions, where the latter would indicate equal loadings of a specific variable and the variable on which the column of 
 has been normalized. The general formulation of such restrictions is
        
        where 
 is the 
i-th column vector of 
, 
 and 
 are conformable vectors and matrices, and 
 contains the remaining unknown parameters in 
. If only zero restrictions are imposed, then 
.
The formulation in (
7) includes several notable special cases. For instance, if all 
 and 
, one obtains the hypothesis that 
 is contained in a given linear space, 
. Another example is given by the case where one column 
 is known, 
; this corresponds to the choice 
 with 
 and 
 void and 
, 
.
The restrictions in (
7) may be summarized as
        
        where 
, 
 and 
. Here 
 indicates a matrix with the (not necessarily square) blocks 
 along the main diagonal. Formulation (
8) generalises (
7).
The main result of this section is stated in the next theorem.
Theorem 1 (Hypotheses on 
)
. Assume that ϑ satisfies linear restrictions of the type (
8)
; then these restrictions are translated into a linear hypothesis on  viawhere  is the commutation matrix satisfying , with A of dimensions , see Magnus and Neudecker (2007). The previous theorem shows that, when one can express a linear hypothesis on the coefficients in 
 that are unrestricted in 
, then the same linear hypothesis is translated into a restriction on 
. Note that the proof simply exploits (
4).
Identification of the restricted coefficients 
 under these hypothesis can be addressed in a straightforward way. In fact, the parameters in 
 are identified; hence 
 is identified provided that the matrix 
K is of full column rank, which in turn will imply that the Jacobian matrix 
 in (
9) has full column rank.
Because, in practice, econometricians may explore the form of 
 via unrestricted estimates of 
, see 
Paruolo (
2002), before formulating restrictions on 
, using hypothesis on the unrestricted coefficients in 
 appears a natural sequential step.
The next subsection discusses the alternative approach of specifying hypotheses directly on .
  3.2. Linear Hypotheses on 
In case placing restrictions on the unrestricted coefficients in  is not what the econometrician wants, this subsection considers linear hypothesis on  directly. It is shown that sometimes it is possible to translate linear hypothesis on  into linear hypothesis on  for some . It is also shown that this is always possible for , for which a constructive proof is provided.
Analogously to (
7), consider linear hypotheses on the columns of 
, of the following type:
        summarized as
        
        In this case, non-zero vectors 
 represent normalizations of the columns of the loading matrix, and as before, 
 collects the unknown parameters in 
.
Theorem 2 (Hypotheses on 
τ⊥)
. Assume that  satisfies linear restrictions of the type  (
11)
, then these restrictions are translated in general into a non-linear 
hypothesis on  viaand the Jacobian of the transformation from ϕ to  isThis parametrization is smooth on an open set in the parameter space Φ 
of ϕ where  is of full rank.  Proof.  Equation (
12) is a re-statement of (
4). Differentiation of (
12) delivers (
13). ☐
 One can note that the Jacobian matrix in (
13) can be used to check local identification using the results in 
Rothenberg (
1971).
The result of Theorem 2 is in contrast with the result of Theorem 1, because the latter delivers a linear hypothesis for 
 while Theorem 2 gives in general non-linear restrictions on 
. One may hence ask the following question: when is it possible to reduce the more general linear hypothesis on 
 given in (
11) to the simpler linear hypothesis on 
 given in (
8)?
In the special case of 
, the following theorem states that this can be always obtained. This applies for instance to the motivating example (
3), where one can choose some 
 so that 
 is equal to the identity, as shown below. Consider the formulation (
10) with 
, and assume that no normalizations have been imposed yet, such that 
. It is assumed that 
, under the equation-by-equation restrictions, satisfies the usual rank conditions for identification, see 
Johansen (
1995, Theorem 1) :
        where 
.
Theorem 3 (Case 
r2 = 2)
. Let  obey the restrictions  satisfying the rank conditions (
14)
; then one can choose normalization conditions on  and  so that there exists a matrix  such that . This implies that a hypotheses on  can be stated in terms of ϑ in (
5)
, and, by Theorem 1, a linear hypotheses on  corresponds to linear hypothesis on .  Proof.  Because  has rank 1, one can select (at least) one linear combination of ,  say, so that  is normalized to be one in the direction , i.e., . Similarly,  has rank 1, and one can select (at least) one linear combination of ,  say, so that  is normalized to be one in the direction , i.e., . Next define  which by construction satisfies . ☐
 The proof of the previous theorem provides a way to construct 
 when 
 and the usual rank condition for identification (
14) holds. The rest of the paper focuses attention on the case of linear restrictions on 
 in (
8), which can be translated linearly into restrictions on 
 as shown in Theorem 1.
  3.3. Example 2 Continued
Consider (
3); this hypothesis is of type 
 with
        
        and hence 
 and 
. In this case one can define 
 and 
 where 
 is the 
j-th column of 
.
It is simple to verify that, under the additional normalization restrictions 
 and 
, 
 in (
3) satisfies 
. Therefore, define 
 as (
3) under these normalization restrictions. Using formula (
4) one can see that
        
        so that 
 is linear in 
, as predicted by Theorem 3.
  4. The VECM Parametrization
This section describes the I(2) parametrization employed in the statistical analysis of the paper. Consider the following 
-parametrization (
-par) of the VECM for I(2) VAR systems
5. See 
Mosconi and Paruolo (
2017):
      with 
. Recall that 
 is the total number of cointegrating relations, i.e., the number of I(1) linear combinations 
. The number of linear combinations of 
 that cointegrate with 
 to I(0), i.e., the number of I(0) linear combinations 
, is indicated
6 by 
. Here 
 is 
, 
 is 
 and the other parameter matrices are conformable; the parameters are 
, 
, 
, 
, 
, 
, 
, all freely varying, and 
 is assumed to be positive definite. When 
 is restricted as 
 with 
 a 
 matrix of freely varying parameters, the 
-par reduces to the parametrization of 
Johansen (
1997); this restriction on 
 is not imposed here.
  4.1. Identification of 
The parameters in the 
-par (
16) are not identified; in particular 
 can be replaced by 
 with 
B square and nonsingular, provided 
 and 
 are simultaneously replaced by 
 and 
. This is because 
 enters the likelihood only via (
16) in the products 
 and 
. The transformation that generates observationally equivalent parameters, i.e., the post multiplication of 
 by a square and invertible matrix 
, is the same type of transformation that induces observational equivalence in the classical system of simultaneous equations, see 
Sargan (
1988), or to the set of cointegrating equations in I(1) systems, see 
Johansen (
1995). This leads to the following result.
Theorem 4 (Identification of 
τ in the 
τ-par)
. Assume that  is specified as the restricted  in (
9)
, which is implied by the general linear hypothesis (
8) 
on ; then the restricted  is identified within the τ-par if and only if(rank condition), where . The corresponding order condition is , or equivalently . Alternatively, consider the general linear hypothesis (
11) 
on ; then the constrained  in (
12) 
is identified in a neighborhood of the point  provided the Jacobian  in (13) 
is of full rank.  Proof.  The rank condition follows from 
Sargan (
1988), given that the class of transformation that induce observational equivalence is the same as the classical one for systems of simultaneous equations. The local identification condition follows from 
Rothenberg (
1971). ☐
   4.2. The Identification of Remaining Parameters
This subsection discusses conditions for remaining parameters of the -par to be identified, when  is identified as in Theorem 4. These additional conditions are used in the discussion of the ML algorithms of the next section.
The VECM can be rewritten as
        
        One can see that the equilibrium correction terms 
 may be replaced by 
 without changing the likelihood, where 
, 
 and
        
        here 
A and 
B are square nonsingular matrices, and 
C is a generic matrix. Hence one observes that 
, 
, 
, 
, 
, 
, 
 is observationally equivalent to 
, 
, 
, 
, 
, 
, 
. 
A, 
B and 
C define the class of observationally equivalent transformations in the 
-par for all parameters, including 
. When 
 is identified one has 
 in the above formulae.
Consider additional restrictions on 
 of the type:
        where 
. The next theorem states rank conditions for these restrictions to identify the remaining parameters.
Theorem 5 (Identification of other parameters in the 
τ-par)
. Assume that τ is identified as in Theorem 4; the restrictions (18) 
identify φ and all other parameters in the τ-par if and only if (rank condition)A necessary but not sufficient condition (order condition) for this is that  Proof.  Because  is identified, one has  in Q. For the identification of , observe that . One finds . Because both  and  satisfy (18), one has . This implies that , i.e., that both  and , and that  is identified, if and only if . This completes the proof. ☐
 Observe that the identification properties of the 
-par differ from the ones of the parametrization of 
Johansen (
1997), where 
 is restricted, and hence the adding-and-subtracting associated with 
C above is not permitted.
  4.3. Deterministic Terms
The 
-par in (
16) does not involve deterministic terms. Allowing a constant and a trend to enter the VAR Equation (
1) in a way that rules out quadratic trends, one obtains the following equilibrium correction I(2) model—for simplicity still called the 
-par below:
        Here 
 so that 
; and 
 and 
.
This parametrization satisfies the conditions of the Johansen Representation Theorem and it generates deterministic trends up to first order, as shown in 
Appendix A. This is the I(2) model used in the application, with the addition of unrestricted dummy variables.
  5. Likelihood Maximization
This section discusses likelihood maximization of the 
-par of the I(2) model (
16) under linear, possibly over-identifying, restrictions on 
, i.e., on 
 in (
5). The same treatment applies to (
21) replacing (
, 
) with (
, 
), and (
, 
), with (
, 
). The formulation (
16) is preferred here for simplicity in exposition.
The alternating maximization procedure proposed here is closely related, but not identical, to the algorithms proposed by 
Doornik (
2017b); related algorithms were discussed in 
Paruolo (
2000b). Restricted ML estimation in the I(1) model was discussed in 
Boswijk and Doornik (
2004).
  5.1. Normalizations
Consider restrictions (
8), which are translated into linear hypotheses on 
 in (
9) as follows
        
        where by construction 
g and 
G satisfy 
 and 
 such that 
.
Next, consider just-identifying restrictions on the remaining parameters. For 
, the linear combinations of first differences entering the multicointegration relations, one can consider
        
        where 
 is the 
 matrix of multicointegration parameters. This restriction differs from the restriction 
 which is considered e.g., in 
Juselius (
2017a, 
2017b), and it was proposed and analysed by 
Boswijk (
2000).
Furthermore, the 
 matrix 
 can be normalized as follows
        
        where 
d is some known 
 matrix, and where 
, of dimension 
, contains freely varying parameters.
It can be shown that restrictions (
22) and (
23) identify the remaining parameters using Theorem 5. In fact, (
22) and (
23) can be written as 
 where 
 and 
. Vectorizing, one obtains an equation 
 of the form (18) with 
 and 
. The rank condition (
19) is satisfied, since 
 because
        
        where the last equality follows from (
22) and (
23) and 
.
  5.2. The Concentrated Likelihood Function
The model (
16), after concentrating out the unrestricted parameter matrix 
, can be represented by the equations
        
        where 
 indicates the vector of free parameters in 
, 
, 
 and 
 are residual vectors of regressions of 
, 
 and 
, respectively, on 
;
7 this derivation follows similarly to Chapter 6.1 in 
Johansen (
1996). The associated log-likelihood function, concentrated with respect to 
, is given by
        
        In the rest of this section, 
 is used as shorthand for 
.
Algorithms for the maximization of the concentrated log-likelihood function  are proposed below. The first one, called al1, considers the alternative maximization of  over  for a fixed value of  (called the -step), and over  for a given value of  (called the -step).
A variant of this algorithm, called 
al2, can be defined fixing 
 in the 
-step to the value of 
 obtained in the 
-step. It can be shown that the increase in 
 obtained in one combination of 
-step and 
-step of 
al1 is greater or equal to the one obtained by 
al2. The proof of this result is reported in Proposition A1 in 
Appendix B. Because of this property, and because 
al2 may display very slow convergence properties in practice, 
al1 is implemented in the illustration below.
The rest of this section presents algorithms al1 and al2, defining first the -step, then the -step and finally discussing the starting values, a line search and normalizations.
  5.2.1.  Step
Taking differentials, one has 
. Keeping 
 fixed, one finds
          
          Writing 
 in terms of 
 and 
, i.e., 
, the first-order conditions 
 and 
 are solved by
          
          where 
, 
, and where 
, and 
. Note that (
25) is the GLS estimator in a regression of 
 on 
. This defines the 
-step for 
al1.
The -step for al2 is defined similarly, but keeping  fixed. In this case it is simple to see that
  5.2.2.  Step
When 
 is fixed (and hence 
 is fixed), one can construct 
 and
          
          The concentrated model (
24) can then be written as a reduced rank regression:
          for which the Guassian ML estimator for 
, 
, 
 has a closed-form solution, see 
Johansen (
1996). Specifically, let 
, 
 and 
, 
. If 
, 
, are the eigenvectors corresponding to the largest 
r eigenvalues of the problem
          
          and 
 is the matrix of the corresponding eigenvectors, then the optimal solutions for 
, 
, 
, 
 is given by
          
          where 
. Optimization with respect to 
 is performed using 
 replacing 
 with 
 formed from the previous expressions, namely taking 
 equal to 
 in the above display and 
 from the 
-step. Using the 
 matrices, one can also compute 
 directly as 
. This completes the definition of the 
-step.
  5.2.3. Starting Values and Line Search
If the system is just-identified, consistent starting values for all parameters can be obtained by imposing the identifying restrictions on the two-stage estimator for the I(2) model (2SI2), see 
Johansen (
1995) and 
Paruolo (
2000a). In case of over-identification, this method may be used to produce starting values for 
, which may then be used as input for the first 
-step to obtain starting values for 
 and 
.
Let 
 be the vector containing all free parameters in 
, and let 
. Denote by 
 the value of 
 in iteration 
 of algorithms. Denote as 
 the value of 
 obtained by the application of a 
-step and 
-step of algorithms 
al1 and 
al2 at iteration 
j starting from 
. In an I(1) context, 
Doornik (
2017a) found that better convergence properties can be obtained if a line search is added. For this purpose, define the final value of the 
j-th iteration as
          
          where 
 is chosen in 
 using a line search; note that values of 
 greater than 1 are admissible. A simple (albeit admittedly sub-optimal) implementation of the line search is employed in 
Doornik (
2017a); it consists of evaluating the log-likelihood function 
 with 
 setting 
 equal to 
 for 
, and in choosing the value of 
 with the highest loglikelihood 
ℓ. This simple choice of line search is used in the empirical illustration.
  5.3. Standard Errors
The asymptotic variance matrix of the ML estimators may be obtained from the inverse observed (concentrated) information matrix as usual. Writing (
24) as 
, and letting 
, the observed concentrated information matrix for the reduced-form parameter vector 
 is obtained from
        
        This leads to the following information matrix in terms of the parameters 
:
        where 
 and 
. From 
 and 
, one obtains
        
        Define 
, so that 
, with
With these ingredients, one finds
        
        where 
, 
 and 
 are the expressions given above, evaluated at the ML estimators. Standard errors of individual parameters estimates are obtained as the square root of the diagonal elements of 
. Asymptotic normality of resulting 
t-statistics (under the null hypothesis), and 
 asymptotic null distributions of likelihood ratio test statistics for the over-identifying restrictions, depend on conditions for asymptotic mixed normality being satisfied; this is discussed next.
  6. Asymptotics
The asymptotic distribution of the ML estimator in the I(2) model has been discussed in 
Johansen (
1997, 
2006). As shown there and discussed in 
Boswijk (
2000), the limit distribution of the ML estimator is not jointly mixed normal as in the I(1) case. As a consequence, the limit distribution of LR test statistics of generic hypotheses need not be 
 under the null hypothesis.
In some special cases, the asymptotic distribution of the just-identified ML estimator of the cointegration parameters can be shown to be asymptotically mixed normal. Consider the case 
 (i.e., 
), and assume as before that no deterministic terms are included in the model. In this case, the limit distribution of the cointegration parameters in Theorem 4 in 
Johansen (
2006), J06 hereafter, can be described in terms of the estimated parameters 
 and 
, where 
 is identified as 
 with 
. Note that the components 
C and 
 in the above theorem do not appear here, because 
. One has
      
      with 
,
      
      and where 
, a vector Brownian motion with covariance matrix 
8.
As noted in J06,  has a mixed normal distribution with mean 0, because  is a function of , which is independent of . Moreover in the case , the  component of the ML limit distribution does not appear, so that the whole limit distribution of the cointegration parameters is jointly mixed normal, unlike in the case .
One can see that hypothesis (
8) defines a smooth restriction of the 
 parameters
9. More precisely 
 depends smoothly only on 
, 
, where 
 contains the 
 parameters in (
8). Note also that 
 depends on the parameters in 
, which are unrestricted by (
8); hence 
 depends only on 
, 
, where 
 contains the parameters in 
 in (
22).
The conditions of Theorem 5 in J06 are next shown to be verified, and hence the LR test of the hypothesis (
8) is asymptotically 
 with degrees of freedom equal to the number of constraints, in case 
. In fact, 
, 
 are smoothly parametrizated by the continuously identified parameters 
 and 
. Because 
 does not depend on 
, one easily deduces 
in (37) of J06. Similarly, one has 
 with 
 and 
 of full rank; hence (38) of J06 is satisfied. This shows that the LR statistic is asymptotically 
 under the null, for 
.
In case 
, the asymptotic distribution of 
 is defined in terms of 
 in J06 p. 92, which is not jointly mixed normal. In such cases, 
Boswijk (
2000) showed that inference is mixed normal if the restrictions on 
 can be asymptotically linearized in 
, and separated into two sets of restrictions, the first group involving 
 only, and the second group involving 
 only. Because the conditions of Theorem 5 in J06 cannot be easily verified for general linear hypotheses of the form (
8) in this case, they will need to be checked case by case. The authors intend to develop more readily verifiable conditions for 
 inference on 
 in their future research.
  7. Illustration
Following 
Juselius and Assenmacher (
2015), consider a 7-dimensional VAR with
      
      where 
, 
, 
 are the (log of) the price index, the long and the short interest rate of country 
i at time 
t respectively, and 
 is the log of the exchange rate between country 1 (Switzerland) and 2 (the US) at time 
t. The results are based on quarterly data over the period 1975:1–2013:3. The model has two lags, a restricted linear trend as in (
21), which appears in the equilibrium correction only appended to the vector of lagged levels, and a number of dummy variables; see 
Juselius and Assenmacher (
2017), which is an updated version of 
Juselius and Assenmacher (
2015), for further details on the empirical model. The data set used here is taken from 
Juselius and Assenmacher (
2017).
Specification (
3) is based on the prediction that 
. Based on I(2) cointegration tests, 
Juselius and Assenmacher (
2017) choose a model with 
, which indeed implies 
, but also 
; arguably, however, the test results in Table 1 of their paper also support the hypothesis 
, which has the same number 
 of common 
 trends. The latter model would be selected applying the sequential procedure in 
Nielsen and Rahbek (
2007) using a 
 or 
 significance level in each test in the sequence.
Consider the case 
. The over-identifying restrictions on 
 implied by (
3) are incorporated in the parametrization (
3), with normalizations 
, which in turn leads to the over-identified structure for 
 in (
15), to be estimated by ML. The restricted ML estimate of 
 is (standard errors in parentheses):
      The LR statistics for the 3 over-identifying restrictions equals 
. Using the 
 asymptotic limit distribution, one finds an asymptotic 
p-value of 
, and hence a rejection of the null hypothesis. This indicates that the hypothesized structure on 
 is rejected.
For comparison, consider also the case 
, for which the LR test for cointegration ranks has a 
p-value of 
. The resulting restricted estimate of 
 is:
      The estimates and standard errors are similar to those obtained under the hypothesis 
. The LR statistic for the over-identifying restrictions now equals 
. If one conjectured that the limit distribution of the LR test is also 
 in this case, one would obtain an asymptotic 
p-value of 
, so the evidence against the hypothesized structure of 
 appears slightly weaker in this model.
The results for both model 
 and for model 
 are in line with the preferred specification of 
Juselius and Assenmacher (
2017), who select an over-identified structure for 
, which is not nested in (
15), and therefore implies a different impact of the common I(2) trends.
  8. Conclusions
Hypotheses on the loading matrix of I(2) common trends are of economic interest. They are shown to be related to the cointegration relations. This link is explicitly discussed in this paper, also for hypotheses that are over-identifying. Likelihood maximization algorithms are proposed and discussed, along with LR tests of the hypotheses.
The application of these LR tests to a system of prices, exchange rates and interest rates for Switzerland and the US shows support for the existence of two I(2) common trends. These may represent a ‘speculative’ trend and a ‘relative prices’ trend, but there is little empirical support for the corresponding exclusion restrictions in the loading matrix.