Asymptotic Theory for Cointegration Analysis When the Cointegration Rank Is Deficient

We consider cointegration tests in the situation where the cointegration rank is deficient. This situation is of interest in finite sample analysis and in relation to recent work on identification robust cointegration inference. We derive asymptotic theory for tests for cointegration rank and for hypotheses on the cointegrating vectors. The limiting distributions are tabulated. An application to US treasury yields series is given.


Introduction
Determination of the cointegration rank is an important part of analyzing the cointegrated vector autoregressive model in the framework of Johansen (1988Johansen ( , 1991Johansen ( , 1995)), Johansen and Juselius (1990), and Juselius (2006).We consider the rank deficient case where the cointegration rank of the data generating process is smaller than the rank used in the statistical analysis.In that case, the data generating process has more unit roots than the number of unit roots imposed in the statistical analysis and the usual asymptotic theory fails.We provide asymptotic theory for cointegration rank tests and tests on cointegration vectors along with simulated tables of the asymptotic distributions.
Cointegration analysis is conducted in three steps.First, the specification of the model is checked.Second, the rank is determined using a sequential procedure using Dickey-Fuller type distributions.Third, the cointegrating vectors are estimated and restrictions can be tested using standard inference.Asymptotic theory shows that estimated rank is consistent in the sense that the probability that the estimated rank is not equal to the true rank equals the size of tests, whereas the probability that the estimated rank is too small vanishes, see Johansen (1992Johansen ( , 1995) ) and Paruolo (2001).Hence, the rank deficiency problem does not arise in the asymptotic analysis.
In practice, rank deficiency matters in two ways.The asymptotic theory often suffers from considerable finite sample distortion.Further, if an investigator wants to focus on the inference on the cointegrating relations then problems can arise if the rank is taken as known when in fact it is deficient.These problems mirror those of instrumental variable estimation with weak instruments, see Mavroeidis et al. (2014).
When conducting inference on the cointegrating vector under near rank deficiency the parameters are weakly identified.At the extreme when testing on the cointegrating vector in the case of a deficient rank the model is mis-specified.This problem arises in cointegration as well as in instrumental variable estimation.In both cases maximum likelihood is conducted using reduced rank regression.The weak identification problem has attracted considerable attention in the instrumental variable literature, see for instance Mavroeidis et al. (2014).Khalaf and Urga (2014) discussed the weak identification problem for cointegration, that is when testing for a known cointegrating vector in the nearly rank deficient situation.These authors investigate various methods to adjust the asymptotic distribution in the weak identification case.This includes a bounds-based critical value suggested by Dufour (1997).This method requires knowledge of the asymptotic theory for the rank deficient case, which we provide here.
The practical problem of ignoring rank deficiency is illustrated using yield curve data.The expectation hypothesis is often interpreted as follows.Interest rates at different maturities are integrated series, but cointegrate so that spreads are stationary.Spreads are often found to be non-stationary.Thus, it is quite possible that a pair of interest rates do not cointegrate.An investigator may proceed by assuming cointegration when there is none, so that the rank is deficient, and conduct inference on the coefficients on the alleged cointegrating vector using standard inference.Our theory shows that the inference is then severely distorted.When the rank is deficient or nearly deficient it is incorrect to use standard inference on the cointegrating vectors.Nonetheless, applying standard inference in the particular example leads to marginal rejection of the hypothesis.Applying the bounds test of Khalaf and Urga (2014) shifts the distribution to the right and there is not much power to reject a hypothesis.If the rank is deficient, which is possible in the example, the alleged cointegrating vector cannot be cointegrating.
Rank deficiency also matters when the rank is determined empirically.Different asymptotic distributions arise in the standard case and when the rank is deficient.The asymptotic distribution tends to give a very good approximation to the finite sample distribution when the rank is far from being deficient, see for instance Nielsen (1997Nielsen ( , 2004) ) When the parameters are in the vicinity of rank deficiency the finite sample distribution tends to be a combination of the two asymptotic distributions.When the parameters are not too close to the rank deficient case a Bartlett correction using a fixed parameter second-order asymptotic expansion works very well, see Johansen (2000Johansen ( , 2002) ) Bootstrap solutions have been discussed in simulation studies by Fachin (2000); Gredenhoff and Jacobson (2001); Swensen (2004); Cavaliere et al. (2012).When the parameters are closer to rank deficient a local-to-unity asymptotic expansion gives an improvement, see Nielsen (2004) for the cointegration case and Nielsen (1999Nielsen ( , 2001) ) for the corresponding instrumental variable case.A starting point for the finite sample analysis is knowledge of the fixed-parameter first-order asymptotic theory across the parameter space, including rank deficient cases.
We discuss the asymptotic theory for models without and with deterministic terms in Sections 2 and 3, respectively.The implications for finite sample analysis and the weakly identified case are discussed in Section 4 along with an application to US treasury zero coupon yields.Section 5 concludes.Proofs are given in an Appendix A.

The Model without Deterministic Terms
We consider the Gaussian cointegrated vector autoregressive model in the case with no deterministic terms.The asymptotic theory for tests for reduced cointegration rank and for a known cointegrating vector is derived when the rank is deficient.Finally, we analyze the case of near rank deficiency.

Model and Hypotheses
Consider a p-dimensional time series X t for t = 1 − k, . . ., 0, 1, . . .T. The unrestricted vector autoregressive model can be written as where the innovations ε t are independent normal N p (0, Ω)-distributed.The parameters Π, Γ i , Ω are freely varying p-dimensional square matrices so that Ω is symmetric, positive definite.
The hypothesis of reduced cointegration rank is formulated as for some 0 ≤ r ≤ p.The interpretation of the hypotheses follows from the Granger-Johansen representation presented in Section 2.2 below.The subscript z indicates that the model has a zero deterministic component.The rank hypotheses are nested so that The rank deficiency problem arises when testing the hypothesis H z (r) when in fact the sub-hypothesis H z (r − 1) is satisfied.The rank is determined to be r if the hypothesis H z (r) cannot be rejected while the sub-hypothesis H z (r − 1) is rejected.As a short-hand we write H • z (r) = H z (r)\H z (r − 1) for this situation.The rank can be determined along the procedure outlined in Johansen (1992Johansen ( , 1995) ) [Section 12.1] and Paruolo (2001).In practice, these decisions are often marginal, hence the need to study the asymptotic theory of test statistics in the rank deficient case.
The rank hypothesis can equivalently be written as where α and β are p × r matrices.The advantage of this formulation is that α and β vary in vector spaces.The formulation does, however, allow rank deficiency where the rank of Π is smaller than r.We follow Johansen (1991, Equation (2.2)) and refer to β as the cointegrating vectors.We find the terminology useful, although it is ambiguous.Indeed, for a particular data generating process where Π has rank less than r then the identity Π = αβ can be satisfied while columns of β may not be row-eigenvectors of Π in which case β X t cannot be stationary.Even when Π has rank r then β X t is only (approximately) stationary under the I(1) condition introduced below.However, from a statistical viewpoint, the estimator of Π under the restriction of rank r will in a finite sample have rank r with probability one.In practice our only knowledge of the rank arises from inference.Johansen's terminology appears to be focused on the statistical viewpoint which we will follow even when studying the rank deficient cases.
The hypothesis of known cointegration vectors is for some unknown matrix α and a known matrix b, both of dimension p × r, so that b has full column rank.The standard analysis is concerned with the situation where α has full column rank, but in the rank deficient case, it has reduced column rank, so that the hypothesis H z (r − 1) is satisfied.When referring to b as the cointegrating vectors, we, once again, follow the terminology of Johansen (1991, Equation (3.1)) even though b X t cannot be stationary under rank deficiency.

Granger-Johansen Representation
The Granger-Johansen representation provides an interpretation of the cointegration model that is useful in the asymptotic analysis.We work with the result stated by Johansen (1995, Theorem 4.2).The theorem requires the following assumption.

I(1) Condition.
Suppose rank Π = s where s ≤ p.Consider the characteristic roots satisfying 0 = det{A(z)} where A . Suppose there are p − s unit roots, and that the remaining roots are stationary roots, so satisfying |z| > 1.
The Granger-Johansen theorem assumes that a process satisfying the model (1) so that rank Π = r and we can write Π = αβ while the I(1) condition holds with s = r.The process then has the representation where the impact matrix C for the random walk has rank p − r and satisfies β C = 0 and Cα = 0, the process S t can be given a zero mean stationary initial distribution and τ depends on the initial observations in such a way that β τ = 0.In other words, the process X t behaves like a random walk with cointegrating relations β X t that can be given a stationary initial distribution.

Test Statistics
The likelihood ratio test statistic for the reduced rank hypothesis H z (r) against the unrestricted model H z (p) is found by reduced rank regression, see Johansen (1995, Section 6).It can be described as a two-step procedure.First, the differences ∆X t and the lagged levels X t−1 are regressed on the lagged differences ∆X t−i , i = 1, . . ., k − 1 giving residuals R 0,t , R 1,t .Secondly, the squared sample correlations, 1 ≥ λ 1 ≥ • • • ≥ λ p ≥ 0 say, of R 0,t and R 1,t are found, by computing product moments S ij = T −1 ∑ T t=1 R i,t R j,t and solving the eigenvalue problem 0 = det(λS 11 − S 10 S −1 00 S 01 ).The log likelihood ratio test statistic for the rank hypothesis is then Under the hypothesis of known cointegration vectors, the likelihood is maximised by least squares regression.The log likelihood ratio test statistic against the unrestricted model H z (p) is therefore given by The log likelihood ratio statistic for the hypothesis of known cointegrating vector against the rank hypothesis is found by combining the statistics in ( 7) and (8), that is The relationship will be useful in the asymptotic theory.For instance, Theorems 1 and 2 give the asymptotic distributions of LR{H z (r) | H z (p)} and LR{H z,β (r) | H z (p)}, respectively.From this we can derive an expression for the distribution of LR{H z,β (r) | H z (r)}.When it comes to tabulation we will need to simulate all three distributions.This would be the case even if the former two statistics were independent.

Asymptotic Theory for the Rank Test
In the asymptotic analysis it is possible to relax the assumption to the innovations.While the likelihood is derived under the assumption of independent, identically Gaussian distributed innovations less is needed for the asymptotic theory.Johansen (1995) assumes the innovations are independent, identically distributed with mean zero and finite variance and uses linear process results from Phillips and Solo (1992).This could be relaxed further to, for instance, a martingale difference assumption.However, for expositional simplicity we follow Johansen's argument and assumptions.
Theorem 1.Consider the rank hypothesis H z (r) : rank Π ≤ r.Suppose H • z (s) = H z (s)\H z (s − 1) holds for some s ≤ r and that the I(1) condition holds for that s.Let F u = B u be a p − s-dimensional standard Brownian motion on [0 Then, for T → ∞, In the standard non-deficient situation where r = s the result reduces to the result of Johansen (1995, Theorem 6.1).The rank deficient case was also discussed by Johansen (1995, p. 158) and Nielsen (2004, Theorem 6.1).
Table 1 reports the asymptotic distribution of the rank test reported in Theorem 1.The simulation were done using Ox (Doornik 2007).The simulation design follows that of Johansen (1995, Section 15).That is, the stochastic integrals in (10) were descretized with T = 1000 and zero initial observations with one million repetitions.The table reports simulated quantiles and moments for r − s = 0, 1, 2 and p − r = 1, 2, 3, 4.However, the case of p − r = 1 and r − s = 0 are analytic values from Nielsen (1997) and where the quantiles were provided by Karim Abadir using his results in Abadir (1995).Bernstein (2014) reports values for higher dimensions.The 85% quantile has not been computed analytically in this case.The first panel of Table 1 reports the distribution for the standard case where s = r.This corresponds to Table 15.1 of Johansen (1995).The second and third panel of Table 1 report the distribution for the rank deficient case where s = r − 1 so r − s = 1 and where s = r − 2 so r − s = 2.The first entry in panel 2 for s = r − 1 and p − r = 1, so r − s = 1, corresponds to Table 6 of Nielsen (2004).It is seen that as the rank becomes more deficient the distribution shifts to the left.It should be noted that if the rank is non deficient, but the I(1) condition is not satisfied then the distribution would tend to shift to the right, see Nielsen (2004) for a discussion.The simulations reported in Table 8 of that paper indicates that the distribution is between these extremes if the rank is deficient and the I(1) condition fails.
The rank test statistic in (7) has been analyzed analytically for the canonical correlation problem in cross-sectional models in Nielsen (1999Nielsen ( , 2001) ) This test also corresponds to the test for relevance in the instrument variable problem.In that case, analytic expressions are available when p = 2, r = 1 and s = 0, 1.When s = 1 we have a χ 2 -distribution with mean 1 and variance 2. When s = 0 the mean is 0.429 and the variance is 0.575 − (0.429) 2 = 0.391, see Nielsen (1999).Thus, the impact of rank deficiency is similar to what is seen in Table 1 for cointegration rank testing.

Asymptotic Theory for the Test on the Cointegrating Vectors
In the analysis of the test for known cointegrating vectors, we focus on the situation where the data generating process has rank s = 0.In this situation the asymptotic distribution is relatively simple to describe, because it does not depend on the value of the hypothesized cointegrating vectors b.This is adequate for a discussion of aspects of situations considered in Khalaf and Urga (2014).If the rank is non-zero but deficient so 0 < s < r, then the data generating process will have cointegrating vectors β 0 of dimension p × s and the asymptotic theory will depend on β 0 and b.In practice, it is rare to test for simple hypotheses when there is more than one hypothesized cointegrating vector, so we do not pursue this complication.
The analysis of the test for known cointegrating vectors is somewhat different from the analysis in Johansen (1995).His analysis is aimed at the situation where different restrictions are imposed on the cointegrating vectors.The argument then involves an intriguing consistency proof for the estimated cointegrating vectors.However, when testing the hypothesis of known cointegrating vectors the likelihood is maximized by the least squares method and the consistency argument is not needed.The asymptotic theory can then be described by the following result.
Theorem 2. Consider the hypothesis H z,β (r) : Π = αb , where α, b have dimension p × r and where α is unknown and b is known with full column rank.Suppose H z (0) is satisfied, so that α = 0 and s = 0, and that the I(1) condition is satisfied with s = 0. Let B u be a p-dimensional standard Brownian motion on [0, 1] with components B 1,u and B 2,u of dimension r and p − r, respectively.Then, for T → ∞, The convergence of the test statistic LR{H z,β (r) | H z (p)} holds jointly with the convergence for the rank test statistic LR{H z (r) | H z (p)}, for s = 0, in Theorem 1.Thus, when s = 0 the formula (9) implies that the limit distribution of the test statistic for known β within the model with rank of at most r can be found as the difference of the two limiting variables.
Table 2 reports the asymptotic distribution of the test for known cointegrating vector in the model where the rank is at most r.When s = r the asymptotic distribution is χ 2 with r(p − r) degrees of freedom, see Johansen (1995, Theorem 7.2.1).When s = 0 the asymptotic distribution reported in Theorem 2 applies.The simulation design is as before.It is seen that in the rank deficient case the distribution is shifted to the right.This matches the finite sample simulations reported by Johansen (2000, Table 2).Table 3 reports the simulated asymptotic distribution of the test for known cointegrating vector in the model where the rank is unrestricted.The distribution is shifted to the right in the rank deficient case.Note, that the table reports the distribution of the convolution of the statistics simulated in Tables 1 and 2, see ( 9).Thus, up to a simulation error the expectations reported in Tables 1 and 2 add up to the expectation reported in Table 3.In the full rank case r = s the statistics in Tables 1 and 2 are independent, as proved below, so also the variances are additive.
is satisfied and that the I(1) condition holds with s = r.Then the rank test statistic LR{H for testing a simple hypothesis on the cointegrating vector are asymptotically independent.
The asymptotic distribution of the rank statistic LR{H

The Case of Nearly Deficient Rank
With the above results we have two extremes.First, the full rank case where standard results apply, that is Johansen's Dickey-Fuller type distribution for rank testing and χ 2 inferences for testing constraints on the cointegrating vectors.Second, the rank deficient case where new Dickey-Fuller type distributions apply both for rank testing and for testing constraints on the cointegrating vectors.In between these extremes we have the nearly rank deficient case corresponding to weak identification in the instrumental variable literature.These nearly deficient cases can be analyzed using local-to-unity parametrization.However, a full theory is notationally complicated as there will be many nuisance parameters.We therefore consider a simple special case inspired by the power analysis of Johansen (1995, Section 14) and distribution analysis of Nielsen (2004).
The main finding is that the appropriate local rate is T −1 as in power analysis for unit tests and cointegration rank tests as opposed to T −1/2 for stationary models as in Andrews and Cheng (2012).Consider a bivariate, first order, local-to-unity vector autoregressive model where where the innovations ε t are independent normal N 2 (0, I 2 )-distributed where b 1 = 0. We now have the following variant of the result for the rank test in Theorem 1.
Theorem 4 (Nielsen 2004, Theorem 6.2).Consider the data generating process (13).Let B u be a bivariate standard Brownian motion on [0, 1] and let J u be the bivariate Ornstein-Uhlenbeck process given by Let 1 ≤ ρ 1 ≤ ρ 2 ≤ 0 be the eigenvalues of the eigenvalue problem The limit distribution is tabulated in Nielsen (2004, Table 8).
We now consider the test for known cointegrating vector, b = (b 1 , b 2 ) .The result in Theorem 2 is modified as follows.
Theorem 5. Consider the data generating process (13).Let B u , J u be defined as in Theorem 4 and let J

The Model with a Constant
We now consider the model augmented with a constant.In the cointegrated model the constant is restricted to the cointegrating space.Thus, the cointegrating vectors consist of vectors relating the dynamic variable extended by a further coordinate for the constant.There are now two rank conditions; one related to the dynamic part of these extended cointegrating vectors and one relating to the deterministic part of the cointegrating vectors.The condition to the cointegration rank in the standard theory can therefore fail in two ways.

Model and Hypotheses
The unrestricted vector autoregressive model is where the innovations ε t are independent normal N p (0, Ω)-distributed.The parameters are the p-dimensional square matrices Π, Γ i , Ω and the p-vector µ.They vary freely so that Ω is symmetric, positive definite.
For the model with a constant there are two types of cointegration rank hypotheses: Their interpretations follow from the Granger-Johansen representation which is reviewed in Section 3.2 below.In short, if there are no rank deficiencies the first hypothesis H c gives cointegrating relations with a constant level and common trends with a linear trend.The second hypothesis H c has a constant level both for the cointegrating relations and the common trends.The hypotheses are nested so that This nesting structure is considerably more complicated than the structure (3) for the model without deterministic terms.A practical investigation may start in three different ways.First, the model ( 14) is taken as the starting point.Both types of hypotheses come into play and the rank is determined as outlined in Johansen (1995, Section 12).Secondly, if visual inspection of the data indicates that linear trends are not present the hypotheses H c may be ignored.Thirdly, if visual inspection of the data indicates that a linear trend could be present, the model ( 14) should be augmented with a linear trend term and we move outside the present framework.Nielsen and Rahbek (2000) discuss the latter two possibilities.Here, we are concerned with the first two possibilities.
The rank hypotheses can equivalently be formulated as The hypotheses of known cointegrating vectors are therefore for a known (p × r)-matrix b with full column rank and, in the second case, also a known (1 × r)-matrix b c so that b * = (b , b c ) has full column rank.

Granger-Johansen Representation
We give a Granger-Johansen representation for each of the two reduced rank hypotheses.Both results follow from Theorem 4.2 and Exercise 4.5 of Johansen (1995).First, consider the hypothesis H c (r). Suppose that the sub-hypothesis H c (r) does not hold and that the I(1) condition holds with s = r.Thus, the (p × r)-matrices α, β have full column rank but α ⊥ µ = 0, so that the matrix Π * = (Π, µ) has rank r + 1.Then, the Granger-Johansen representation is where the impact matrix C has rank p − r and satisfies β C = 0 and Cα = 0 while τ = Cµ = 0.As a consequence, the process has a linear trend, but the cointegrating relations β X t do not have a linear trend, since β C = 0. Secondly, consider the hypothesis H c (r). Suppose that the sub-hypothesis H c (r − 1) does not hold and that the I(1) condition holds with s = r.Thus, the (p × r)-matrices α, β have full column rank, and the {(p + 1) × r}-matrix β * = (β, β c ) has full column rank.Then, the Granger-Johansen representation (22) holds with τ = 0, while τ c has the property that β τ c = −β c .In other words, the process X t behaves like a random walk where β X t has an invariant distribution with a non-zero mean, while β X t + β c has a zero mean invariant distribution.

Test Statistics
The test statistics are variations of those for the model without deterministic terms.The differences relate to the formation of the residuals R 0,t and R 1,t First, consider the reduced rank hypothesis H c (r) and the corresponding hypothesis H c ,β (r) of known cointegrating vectors.The residuals R 0,t and R 1,t are formed by regressing the differences ∆X t and the lagged levels X t−1 on an intercept and the lagged differences ∆X t−i , i = 1, . . ., k − 1.In the second step, compute the canonical correlations 1 ≥ λ 1 ≥ • • • ≥ λ p ≥ 0 of R 0,t and R 1,t .The rank test statistic LR{H c (r)|H c (p)} then has the form (7). The test statistic for known cointegrating vectors LR{H c ,β (r)|H c (p)} has the form (8), using the same residuals R 0,t and R 1,t , and the hypothesized cointegrating vectors b.
Secondly, consider the reduced rank hypothesis H c (r) and the corresponding hypothesis H c,β (r) of known cointegrating vectors.The residuals R 0,t and R 1,t are formed by regressing the differences ∆X t and the vector formed by stacking the lagged levels and an intercept X * t−1 = (X t−1 , 1) on the lagged differences ∆X t−i , i = 1, . . ., k − 1.In the second step, compute the canonical correlation of these R 0,t and R 1,t .The rank test statistic LR{H c (r)|H c (p)} then has the form (7). The test statistic for known cointegrating vectors LR{H c,β (r)|H c (p)} has the form (8), using the same residuals R 0,t and R 1,t , and the hypothesized cointegrating vectors b * = (b , b c ) .

Asymptotic Theory for the Rank Tests
There are now four situations to consider.Indeed, the nesting structure in (17) shows that each of the two rank hypotheses H c (r) and H c (r) can be rank deficient in two ways when either of In three cases the limiting distribution is of the same form as in Theorem 1, albeit with a different limiting random function F u .In the fourth case the limiting distribution has nuisance parameters.The nuisance parameter case arises when testing H c (r) with a data generating process satisfying H • c (s) = H c (s)/H c (s).This is the case that can often be ruled out through visual inspection of the data as mentioned in Section 3.1.
We start with the test for the hypothesis H c (r) in the rank deficient case where H • c (s) = H c (s)/H c (s) holds for s < r.Johansen (1995) discusses the possibility H • c (r).The asymptotic theory is as follows.
Theorem 6.Consider the rank hypothesis H c (r) : rank Π ≤ r.Suppose H • c (s) = H c (s)\H c (s) holds for some s ≤ r, so that rank Π = s and rank (Π, µ) = s + 1 and that the I(1) condition is satisfied for that s.Let B u be a (p − s)-dimensional standard Brownian motion on [0, 1].Define a (p − s)-dimensional vector F u with coordinates 11) using the present F.
Table 4 reports the simulated asymptotic distribution of the rank test reported in Theorem 6.The first panel gives the standard case where s = r and corresponds to Johansen (1995, Table 15.3).For p − r = 1 the asymptotic distribution is actually χ 2 and the numbers are the standard numerically calculated ones rather than simulated ones.The second and the third panel report the distribution for the rank deficient case H • c (s) where H c (s) holds, but H c (s) fails.The distribution is shifted to the left when r − s > 0 as in Table 1.The second case is the test for the same hypothesis H c (r) in the rank deficient case where Theorem 7. Consider the rank hypothesis H c (r) : rank Π ≤ r.Suppose H • c (s) = H c (s)\H c (s − 1) holds for some s ≤ r, so that rank Π = rank Π * = s and that the I(1) condition is satisfied for that s.Let B u be a (p − s)-dimensional standard Brownian motion on [0, 1].Define a (p − s)-dimensional vector F u as the de-meaned Brownian motion Then LR{H c (r) | H c (p)} converges as in (11) using the present F.  Johansen and Juselius (1990).It is shifted to the right when compared to the first panel of Table 4.The second and the third panel of Table 5 report the distribution for the rank deficient case H • c (s) for s < r.In those case the distribution is shifted to the left relative to the first panel as in Tables 1 and 4. In the third case we consider the test for the hypothesis H c (r) in the rank deficient case where Theorem 8. Consider the rank hypothesis H c (r) : rank Π ≤ r.Suppose H • c (s) = H c (s)\H c (s − 1) holds for some s ≤ r so that rank Π = rank (Π, µ) = s and that the I(1) condition is satisfied for that s.Let B u be a (p − s)-dimensional standard Brownian motion on [0, 1].Define a (p − s + 1)-dimensional vector F u given as Then LR{H c (r) | H c (p)} converges as in (11) using the present F.
Table 6 reports the simulated asymptotic distribution of the rank test reported in Theorem 8.The first panel gives the standard case where s = r and corresponds to Johansen (1995, Table 15.2).The second and the third panel report the distribution for the rank deficient case H • c (s) for s < r.Once again, the distribution shifts to the left in the rank deficient case.The final case is the test for the hypothesis H c (r) in the rank deficient case where H • c (s) = H c (s − 1)/H c (s − 1) for s < r.In this case the limiting distribution has nuisance parameters.We do not give the result here, since it is complicated to state and it does not seem particularly useful in practice.Indeed in practical work, this type of data generating process can often be ruled through visual data inspection as discussed in Section 3.1.Furthermore, it would be hard to deal with the nuisance parameters in applications.
It is worth noting that the proof in this final case would be somewhat different from the proof of Theorems 1, 6-8.They are all proved by modifying the argument of Johansen (1995, Sections 10 and 11).However, in the final case, a cointegration vector with random coefficients arise.Therefore, the analysis is best carried out in terms of the dual eigenvalue problem 0 = det(λS 00 − S 01 S −1 11 S 10 ) as opposed to the standard eigenvalue problem 0 = det(λS 11 − S 10 S −1 00 S 01 ).

Asymptotic Theory for the Test on the Cointegrating Vectors
We now consider the tests on the cointegrating vectors in the rank deficient case when a constant is present in the model.There is now a wide range of possible limit distributions.Only a few of these will be discussed.
The unrestricted model is H c (r) where the constant is restricted to the cointegrating space.Thus, in the full rank case the Granger-Johansen representation ( 22) has a zero linear slope τ = 0 and level satisfying β τ c = −β c .
Consider now the hypothesis of a known cointegrating vector, (21).It is now important whether the hypothesized level for the cointegrating vector, b c is zero or not.If b c = 0 then a nuisance parameter depending on b, b c would appear in the limit distributions in the rank deficient case.If b c = 0 then the limit distributions are simpler.Fortunately, the zero level case is the most natural hypothesis in most applications.The asymptotic theory for the test statistic is described in the following theorems.
Theorem 9. Consider the hypothesis H c,β (r) : (Π, µ) = αb * where b * = (b , b c ) .Here, α, b have dimension p × r while b c is an r-vector, where α is unknown and b * is known and b has full column rank.Suppose H z (0) is satisfied so that Π = 0, µ = 0, and s = 0 and that the I(1) condition is satisfied.Let B be a p-dimensional standard Brownian motion on [0, 1], where the first r components are denoted B 1 .Define the (p − s + 1)-dimensional process F u = (B u , 1) as in ( 23).Then it holds, for T → ∞, that The convergence of the test statistic LR{H c,β (r) | H c (p)} holds jointly with the convergence for the rank test statistic LR{H c (r) | H c (p)}, for s = 0, in Theorem 8. Thus, when s = 0 a formula of the type (9) implies that the limit distribution of the test statistic for known β within the model with rank of at most r satisfies can be found as the difference of the two limiting variables.
Table 7 reports the asymptotic distribution of the test for known cointegrating vector in the model where the rank is at most r.When s = r, the asymptotic distribution is χ 2 with r(p + 1 − r) degrees of freedom, see Johansen and Juselius (1990, p. 193-194), Johansen et al. (2000, Lemma A.5).When s = 0 the distribution is simulated according to Theorem 9.It is shifted to the right relative to the case s = r.
Table 8 reports the simulated asymptotic distribution of the test for known cointegrating vector in the model where the rank is unrestricted.The distribution is shifted to the right in the rank deficient case.As in the zero level case, the expectations reported in Tables 6 and 7 add up to the expectation reported in Table 8.In the full rank case s = r the statistics in Tables 6 and 7 are independent, as proved below, so the variances are additive.

Theorem 10. Consider the hypothesis H
is satisfied and that the I(1) condition holds with s = r.Then the rank test statistic LR{H for testing a simple hypothesis on the cointegrating vector are asymptotically independent.The asymptotic distribution of the rank statistic LR{H • c (r)|H • c (p)} is given in Theorem 1, while the statistic for the cointegrating vector LR{H

Applications of Results
We discuss how our results apply to the finite sample theory and to identification robust inference.An application to US treasury yields is given.

Finite Sample Theory
The finite sample distribution of cointegration rank tests have been studied in various ways.When there are no nuisance parameters, the asymptotic distributions generally give good approximations.An example is the test for a unit root in a first order autoregression, where the finite sample distribution and the asymptotic distribution are nearly indistinguishable for T = 8 observations, see Nielsen (1997).A Bartlett correction improves the asymptotic distribution further.Once there are nuisance parameters the situation is different.Under the rank hypothesis the asymptotic distribution differs if there are additional unit roots.This arises either with rank deficiency like here where the distributions tend to be shifted to the left and when there are double roots as in I(2) systems where the distributions are shifted to the right.Nielsen (2004) analyzed this through simulation and suggested to apply local-to-unity approximation that would average between the different asymptotic distributions.A similar idea was implemented analytically for canonical correlation models in Nielsen (1999).In a follow-up paper, Nielsen (2001) analyzed the effects of plugging parameter estimates into such corrections.Johansen (2002) suggested a Bartlett correction for such models.This works quite well when the nuisance parameters are such that they are far from giving additional unit roots.The issue is that the Bartlett correction asymptotes to infinity when there are additional unit roots.More recently, bootstrap methods have been explored by Swensen (2004) and by Cavaliere et al. (2012).Johansen (2000) derives a Bartlett-type correction for the tests on the cointegrating relations.In Table 2 he considers the finite sample properties of a test comparing the test statistic LR{H z,β (1)|LR{H z (1)} with the asymptotic χ 2 -approximation.Null rejection frequencies are simulated for dimensions p = 2, 5, a variety of parameter values, and a finite sample size T.In all the reported simulations the data generating process has rank of unity.The table shows that null rejection frequency can be very much larger for a nominal 5% test when the rank is nearly deficient.
Theorem 2 sheds some light on the behaviour of the test as the rank approaches deficiency.The Theorem shows that the test statistic converges for all deficient ranks.Table 2 indicates that the distribution shifts to the right in the rank deficient case.Thus, we should expect that null rejection frequency increases as the rank approaches deficiency, but it should be bounded away from unity.

Identification Robust Inference
Khalaf and Urga (2014) were concerned with tests on cointegation vectors in situations where the cointegration rank is nearly deficient.Their results can be developed a little further using the present results.
The notation in Khalaf and Urga (2014) differs slightly from the present notation.The hypothesis of known cointegration vectors is stated as β 0 = (I r , b 0 ) for some known b 0 , corresponding to the present hypotheses H z,β (r) and H c ,β (r).The test statistics are for m = z, c .Moreover they consider the hypothesis H m,Π (r), say, of a known impact matrix Π of rank r.This is tested through the statistic When the rank is not deficient the test statistic LRC(b 0 ) is asymptotically χ 2 r(p−r) , see Johansen (1995, Section 7).The test statistic LR(b 0 ) has a Dickey-Fuller type distribution as derived in Theorem 2 for the case without deterministic terms, contradicting the χ 2 asymptotics suggested by Khalaf and Urga (2014, Section 4).Table 2 indicates that this distribution is close to, but different from, a χ 2 p(p−r) -distribution when p = 2, 3 and p − r = 1.When p = 3 and r = 1, the limiting distribution is further from a χ 2 p(p−r) -distribution.Likewise, the statistic LR * converges to a Dickey-Fuller-type distribution.This can be proved through a modification of the proof of Theorem 2.
Khalaf and Urga's Theorem 1 is concerned with bounding the distribution of the likelihood ratio statistic for the hypothesis Π = ab , where a, b are known p × r-matrices so that b has rank r, against the alternative where Π is unrestricted.The idea of their Theorem is to come up with a bound to the critical value when a, b may have deficient rank s ≤ r.Unfortunately, their theorem evolves around the incorrect χ 2 distribution although unit root testing is implicitly involved.We therefore reformulate the result in terms of the limiting distributions derived herein.
We consider the test statistic LR(b 0 ) = LR{H z,β (1)|H z (1)} when the rank of Π is nearly deficient.Suppose the rank is nearly deficient in the sense that Π ≈ T −1 M for some matrix M along the lines of the theory in Section 2.6.Then, intuitively, the limiting distribution will be a combination of those arising when the true rank is 0 and when it is 1.The asymptotic theory developed here gives the relevant bounds.In the case of the zero level model the Theorems 1 and 2 imply the following pointwise result.
Theorem 11.Let θ denote the parameters of the model (1).Consider the parameter space Θ z where the hypothesis H z,β (1) : Π = αb holds.Here α, b are both of dimension p × 1.Here α is unknown, while b is known and has full column rank.Suppose the data generating process satisfies the I(1) condition with s ≤ 1.Let q z,s be the asymptotic (1 − ψ) quantile of LR{H z,β (1)|H z (1)} when the data generating process satisfies H • z,β (s) for s = 0, 1.Let q z, * = max s=0,1 q z,s .Then it holds for all θ ∈ Θ z that The simulated values in Table 2 show that for ψ = 5% then q z, * = max(q z,0 , q z,1 ) = max(9.05,3.84) = 9.05 for p = 2, max(13.82,5.99) = 13.82 for p = 3. ( The interpretation is as follows.Suppose the hypothesis H z (1) has not been rejected, but it is unclear whether the rank could be nearly deficient.Then the hypothesis of a known β 0 is rejected if the statistic LR{H z,β (1)|H z (1)} is larger than q z, * .
The bound for q z, * seems very extreme.Khalaf and Urga therefore suggest to use the alternative statistic LR{H z,β (1)|H z (p)}.Theorem 11 could be modified to cover this statistic.The simulations in Table 3 indicate that we would then use bounds qz, * = max( qz,0 , qz,1 ) = max(9.70,6.22) = 9.70 for p = 2, max (20.83, 15.34) = 20.83 for p = 3. (30) We can establish a similar result for the constant level model using Theorems 8 and 9.However, it is necessary to exclude the possibility of a linear trends in the rank deficient model as this would give a very complicated result.
Theorem 12. Let θ denote the parameters of the model ( 14).Consider the parameter space Θ c where the hypothesis H c,β (1) : (Π, µ) = α(b , b c ) holds.Here α, b are both of dimension p × 1, while b c is a scalar.Further b, b c are known and b = 0. Suppose the data generating process satisfies the I(1) condition with s = 0 or s = 1.Let q c,s be the asymptotic (1 − ψ) quantile of LR{H c,β (1)|H c (1)} when the data generating process satisfies H • c,β (s) for s = 0, 1.Let q c, * = max s=0,1 q c,s .Then it holds for all θ ∈ Θ 1 that The simulated values in Table 7 show that for ψ = 5% then q c, * = max(q z,0 , q z,1 ) = The bounds (32), ( 33) for the constant level model appear further apart than the corresponding bounds (29), ( 30) for the zero level model.So in the constant level case there is perhaps less reason to use the test against the unrestricted model.

Empirical Illustration
The identification robust inference can be illustrated using a series of monthly US treasury zero-coupon yields over the period 1987:8 to 2000:12.The data are taken from Giese (2008) and runs from the start of Alan Greenspan's chairmanship of the Fed and finishes before the burst of the dotcom bubble.Giese considers 5 maturities (1, 3, 18, 48, 120 months), but here we only consider 2 maturities (12, 24 months).The empirical analysis uses OxMetrics, see Doornik and Hendry (2013).
Figure 1 shows the data in levels and differences along with the spread.The spread does not appear to have much of a mean reverting behaviour.It is not crossing the long-run average for periods of up to 4 years.This point towards a random walk behaviour which contradicts the expectations hypothesis in line with Giese's analysis.She finds two common trends among five maturities.The two common trends can be interpreted as short-run and long-run forces driving the yield curve.The cointegrating relations match an extended expectations hypothesis where spreads are not cointegrated but two spreads cointegrate.This is sometimes called butterfly spreads and gives a more flexible match to the yield curve.This is in line with earlier empirical work.Hall et al. (1992), among others, found only one common trend when looking at short-term maturities, while Shea (1992); Zhang (1993) and Carstensen (2003) found more than one common trend when including longer maturities.A vector autoregression of the form ( 14) with an intercept, k = 4 lags as well as a dummy variable for 1987:10 was fitted to the data.This has the form where X t is the bivariate vector of the 12 and 24 month zero-coupon yields and periods t = 1 and t = T correspond to 1987:8 and 2000:12 giving T = 161.Table 9 reports specification test statistics with p-values in square brackets.The tests do not provide evidence against the initial model.They are the autocorrelation test of Godfrey (1978) the cumulant based normality test, see Doornik and Hansen (2008), and the ARCH test of Engle (1982).For the validity of applying the autoreregressive and normality tests for non-stationarity autoregressions, see Engler and Nielsen (2009), Kilian and Demiroglu (2000), and Nielsen (2006).
The dummy variable matches the policy intervention after the stock market crash on 19 October 1987.Empirically, the dummy variable can be justified in two ways.First, the plot of yield differences in Figure 1b indicate a sharp drop in yields at that point.Secondly, the robustified least squares algorithm analyzed in Johansen and Nielsen (2016) could be employed for each of the two equations in the model.The algorithm uses a cut-off for outliers in the residuals that is controlled in terms of the gauge, which is the frequency of falsely detected outliers that can be tolerated.The gauge is chosen small in line with recommendations of Hendry and Doornik (2014, Section 7.6), see also Johansen and Nielsen (2016).Thus, we choose a cut-off of 3.02 corresponding to a gauge of 0.25%.When running the autoregressive distributed lag models without outliers, only 1987:10 has an absolute residual exceeding the cut-off.Next, when re-running the model including a dummy for 1987:10, no further residuals exceed the cut-off.This is a fixed point for the algorithm.The detection of outliers may have some impact on specification tests, estimation, and inference.Johansen andNielsen (2009, 2016) analyze the impact on estimation when the data generating process has no outliers.They find that outlier detection only gives a modest efficiency loss compared to standard least squares when the cut-off is as large as chosen here.Berenguer-Rico and Nielsen (2017) find a considerable impact on the normality test employed above.At present, there is no theory for these algorithms for data generating processes with outliers, albeit some results are available for cointegration analysis with known break date, including the broken trend analysis of Johansen et al. (2000) and the structural change model of Hansen (2003).4 and 6 for s = r corresponding to Johansen (1995, Tables 15.2, 15.3).The sixth column shows p-values based on Tables 5 and 6 assuming data have been generating by a model satisfying H c (0) = H z (0).In both cases the p-values are approximated by fitting a Gamma distribution to the reported mean and variance, see Nielsen (1997); Doornik (1998) for details.As expected, the latter p-values tend to be higher than the former.Overall this provide overwhelming evidence in favour of a pure random walk model in line with Giese (2008).If we have a strong belief in the expectation hypothesis we would, perhaps, ignore the rank tests and seek to test the expectations hypothesis directly.If we maintain the model H c (1), we could have to contemplate that the cointegration vectors could be nearly unidentified.A mild form of the expectation hypothesis is that the spread is zero mean stationary.Thus, we test the restriction b * = (1, −1, 0).The likelihood ratio statistic is 4.0.Assuming the data generating process satisfies either H • c (0) or H • c (1), but not by H • c (0), we can apply the Khalaf-Urga (2014)-type bound test established in Theorem 12.The 95% bound in (32) is 14.05 so the hypothesis cannot be rejected based on this test.This contrasts with the above rank tests which gave strong evidence against the expectations hypothesis.The results reconcile if the bounds test does not have much power in the weakly identified case.Indeed, this seems to be the case when looking at Table 3, ρ = 0.99-panels in Khalaf and Urga (2014), corresponding to near rank deficiency or weak identification.Thus, assuming the rank is one when in fact the data generating process appears to be nearly rank deficient seems to reduce power for tests on the cointegrating vector.That is, when the alleged cointegrating vector is not cointegrating it would be useful to be able to falsify the economic hypothesis.The above mentioned simulations indicate that this is not the case.

Conclusions
We have derived asymptotic theory for cointegration rank tests and tests on cointegrating vectors in the rank deficient case.The asymptotic distributions have been simulated and tabulated.The results shed some light on the finite sample theory for cointegration analysis.They can be used to improve the theory on identification robust inference developed by Khalaf and Urga (2014).This was applied to two US treasury yield series.
It appears that large distortions arise when applying standard cointegration inference in the situation where the rank is deficient or nearly deficient.The rank hypothesis gives an inequality for the rank, that is rank Π ≤ r.This includes cases where the rank is r and where it is less than r.Thus, the parameter space for the model where rank Π ≤ r therefore has a lower dimensional subset where the rank is deficient.Inferential procedures for rank determination are consistent but do leave a positive probability of deciding for a deficient rank in finite samples.In practice, it is therefore possible to end up in a situation of rank deficiency or near deficiency.When proceeding to testing restrictions on the cointegrating vectors, the model is therefore mis-specified or nearly mis-specified.
The asymptotic analysis of the test distributions gives the following results.When testing for cointegration rank, the distribution shifts to the left when the rank is deficient.When testing for restrictions on the cointegrating vector, the distribution shifts to the right when the rank is deficient.When the rank is nearly deficient the distribution will tend to shift in similar directions.As a consequence, a test for cointegration restrictions using conventional critical has a size control problem previously observed by Johansen (2000).One can instead apply identification robust tests as suggested by Khalaf and Urga (2014), but our impression is that while these tests are better behaved in terms of size, they have modest power to reject incorrect restrictions.
Our recommendation is to test for rank before testing restrictions on cointegrating vectors in line with Johansen's framework.If the conclusion from the rank determination is ambiguous it is best to proceed with caution and possibly explore different choices for rank.This is a common theme in the applied work of Juselius.

Table 1 .
Quantiles, mean, and variance of LR{H z (r)|H z (p)}, where the data generating process has rank s = rank Π ≤ r.

Table 2 .
Quantiles, mean, and variance of LR{H z,β (r)|H z (r)}, where the data generating process has rank s = rank Π ≤ r.

Table 3 .
Quantiles , mean, and variance of LR{H z,β (r)|H z (p)}, where the data generating process has rank s = rank Π ≤ r.

Table 4 .
Quantiles, mean, and variance of LR{H c (r)|H c (p)}, where the data generating process satisfies H • c (s) = H c (s)\H c (s) with s ≤ r.
Table 5 reports the simulated asymptotic distribution of the rank test reported in Theorem 7. The first panel where s = r and corresponds to Table A.2 of

Table 5 .
Quantiles, mean, and variance of LR{H c (r)|H c (p)}, where the data generating process satisfies H • c (s) = H c (s)\H c (s − 1) with s ≤ r.

Table 6 .
Quantiles, mean, and variance of LR{H c (r)|H c (p)}, where the data generating process satisfies H • c (s) = H c (s)\H c (s − 1) with s ≤ r.

Table 9 .
Specification tests for the unrestricted vector autoregression.
Table 10 reports cointegration rank tests.The fifth column shows conventional p-values based on Tables