Partial Cointegrated Vector Autoregressive Models with Structural Breaks in Deterministic Terms

: This paper proposes a class of partial cointegrated models allowing for structural breaks in the deterministic terms. Moving-average representations of the models are given. It is then shown that, under the assumption of martingale difference innovations, the limit distributions of partial quasi-likelihood ratio tests for cointegrating rank have a close connection to those for standard full models. This connection facilitates a response surface analysis that is required to extract critical information about moments from large-scale simulation studies. An empirical illustration of the proposed methodology is also provided.


Introduction
Partial cointegration models with structural shifts in level or linear trends are quite common in practice; however, no formal analysis is available for these models. The likelihood analysis of the partial models with such breaks is based on reduced rank regression, just like standard full cointegrated vector autoregressive models introduced by Johansen (1988Johansen ( , 1995. The main difference lies in the fact that likelihood-based tests for cointegrating rank in the partial models involve a set of new asymptotic distributions which reflect the combination of weakly exogenous regressors and broken deterministic terms. We generalise the standard assumption of normal innovations (Johansen 1995) to a flexible class of heterogeneous martingale difference innovations. We then derive the asymptotic distributions of the test statistics in question and provide a simulated responsed surface of the asymptotic distribution.
The presented models combine two widely used extensions of Johansen's original model. The first extension was a partial cointegrated system investigated by Harbo et al. (1998), referred to as HJNR henceforth, see also Pesaran et al. (2000). This partial system is a conditional vector autoregressive model for a vector of variables, Y t , given another vector of variables, Z t , as well as lags of both variables. They also presented simulated tables for asymptotic rank test distributions based on the partial system. Boswijk (1995) and Ericsson and MacKinnon (2002) explored the use of conditional autoregressive models. Recently, Cavaliere et al. (2018) considered information criteria based on the HNJR test statistics. The second extension was a full cointegrated system with structural breaks in a constant level or linear trend, a model explored by Johansen et al. (2000), referred to as JMN hereafter. This full model is a multivariate extension of model C of Perron (1989), where both level and linear trend slope change at the time of the break, as opposed to his models A and B, in which only one of the two is changing. Deterministic breaks in cointegrated systems have also been explored by Inoue (1999) and Hendry and Massmann (2007).
Each of the two extensions above has proved to be useful in empirical applications; furthermore, subsequent practical work has shown that we frequently require both of the two extensions simultaneously. As an example, Bårdsen et al. (2005) built a large scale model of the Norwegian economy by combining a number of smaller partial cointegration models. Each of these sub-systems is regarded as a partial model subject to structural shifts, and these types of models are useful in a practical sense for empirical macroeconomic research. As it stands, however, the exact asymptotic properties of likelihood-based test statistics derived from the partial models with structural breaks are unknown, so that a formal econometric study based on these models is unfeasible. This paper, therefore, conducts both analytical and simulation-based investigations into the unknown asymptotic properties so that researchers can perform a formal analysis using the partial models with structural breaks. Another example of these partial models is a trade model for the UK by Schreiber (2015), which we are going to use as an empirical illustration later in this paper.
This paper shows that the asymptotic distributions of the proposed likelihood-based test statistics are dependent on information about the dimension of the variables Y t and Z t , cointegrating rank, the number of breaks and their locations, but the distributions themselves are free of any unknown parameters. Hence, the limit distributions can be simulated given the above information, as in a manner similar to Johansen (1995, §15), HJNR or MacKinnon et al. (1999. The Granger-Johansen representation for the full model in JMN is also reexamined as a basis for the required asymptotic study, and this reexamination can be viewed as a useful clarification of roles of a set of starting values in the workings of the system. It should be noted that a condition for weak exogeneity reviewed in Section 2 is assumed to be satisfied when exploring the properties of the test statistics; the violation of this condition can give rise to a class of limit results that are unfavourable in applications, as discussed by Johansen (1992a). This assumption is testable by following an ex-post testing procedure suggested by Johansen (1992a) and others. We demonstrate this procedure in the empirical illustration in Section 5.
In deriving the asymptotic distributions of the test statistics, the assumption of normal innovations in Johansen (1995), HJNR and JMN, is relaxed to the assumption of martingale difference innovations, with a view to widening the scope of applications of the proposed models. This means we have to be careful in developing asymptotic arguments required for the quasi-likelihood ratio test statistics. We use martingale limit results of Anderson and Kunitomo (1992) and Brown (1971) for approximately stationary components and for non-stationary components, respectively.
Furthermore, it is shown that the derived asymptotic distributions can be approximated by gamma distributions, a class of common statistical distributions identifiable only by the first two moments; the validity of this gamma-distribution approximation method in various other existing models was documented by Nielsen (1997), Doornik (1998) and JMN. The study utilises the fact that mean and variance of the limit distributions for the proposed partial models are expressible in terms of the mean and variance for full models and certain covariance terms. As a result, it is feasible to apply the gamma approximation method to simulation results based on the fullmodels, in order to obtain precise limit quantiles of the test statistics for the proposed partial models. Hence, we are justified in conducting comprehensive simulations in the full-model framework, the results of which are applied in a response surface analysis combined with the gamma approximation method. The outcomes of the response surface analysis are tabulated in two tables, the accuracy of which is verified by moving back to the partial-model framework. The tables allow researchers to conduct formal applied studies with the proposed partial models. A brief empirical study is also provided.
Overall, this paper adds to the literature on time series econometrics and applied macroeconomics. As a result, the partial cointegrated models will be recognised as more flexible and practical devices for modelling and analysing non-stationary time series data containing structural breaks. For I(2) models, Paruolo and Rahbek (1999) proposed partial analysis while Kurita et al. (2011) introduced a model with deterministic shifts. In future work, it may be of interest to combine those ideas as well.
The rest of this paper consists of five sections. Section 2 introduces partial cointegrated models subject to deterministic breaks and their moving-average representations. Section 3 derives partial quasi likelihood-based tests for cointegrating rank allowing for the breaks, and explores the limit distributions of the test statistics. In this section, a response surface analysis is performed by using simulated distributions and then the results of the analysis are summarised as a set of statistical tables. An empirical illustration of the proposed methodology is provided in Section 5. Finally, Section 6 gives concluding remarks. This study used Ox (Doornik 2013) and PcGive (Doornik and Hendry 2013) to conduct the simulations and the empirical study, respectively.

Models and Representations
We introduce partial cointegrated vector autoregressive models with deterministic breaks. Section 2.1 reviews the existing models known, while Sections 2.2-2.4 provide details of the proposed models.

Previous Models
The cointegrated vector autoregressive model was proposed by Johansen (1988Johansen ( , 1995. Suppose that we observe a p-variate vector time series X t integrated of order 1, denoted as I(1) hereafter. In the presence of two lags, a constant and restricted linear trend, the model equation for X t with index for the linear trend model and the associated cointegrating rank hypothesis, for r ≤ p, Here, the initial values X 1 and X 2 are fixed while p-vector innovations ε 3 , . . . , ε T are distributed as independent normal, denoted by N p (0, Ω). The parameters in Equation (1) are all variation free, defined as α, β ∈ R p×r , γ ∈ R r , µ ∈ R p and Γ, Ω ∈ R p×p and with Ω being positive definite. This model is interpreted in terms of its Granger-Johansen representation. The likelihood function is maximised through reduced rank regression of ∆X t on the vector of X t−1 , 1 corrected for ∆X t−1 . The cointegrating rank r can be determined through a sequence of rank test statistics, which have Dickey-Fuller type limit distributions depending on the number of common trends, p − r in this case, and with a linear trend adjustment. Once the rank is determined, asymptotic inference for the cointegrating vectors β and the adjustment vectors α can be based on χ 2 distributions.
The partial model is derived from the model given by Equation (1), which is referred to as the full model henceforth. This allows exogenous regressors that are not necessarily analysed in the model equation. With a view to setting up the partial model, let us introduce an integer m satisfying 0 ≤ r ≤ m < p, so that we can decompose X t into an m-vector Y t and a vector Z t of dimension p − m. Decompose the parameters and error terms of Equation (1) conformably so that, for instance, and Ω = Ω yy Ω yz Ω zy Ω zz .
We also define the population regression coefficient ω = Ω yz Ω −1 zz , which leads to a class of conditional coefficients Π y·z = Π y − ωΠ z , Γ y·z = Γ y − ωΓ z and µ y·z = µ y − ωµ z . The partial or conditional model for Y t given Z t is then presented as where the conditional innovation sequence ε y·z,t = ε y,t − ωε z,t is N m (0, Ω yy·z ) distributed, so ε y·z,t is independent of Z t and the overall past series, while its variance is The cointegration rank hypothesis is, for r ≤ m, where α y·z = α y − ωα z . The marginal model for Z t is simply given as Due to the conditioning of Y t on Z t , the innovations ε y·z,t and ε z,t are independent. Even so, the cointegrating relationships β X t−1 + γt form cross equation restrictions, so that maximum likelihood estimation involves a joint analysis of (3) and (6). The rank can be determined from a partial analysis using information criteria albeit without size control as argued by Cavaliere et al. (2018).
Weak exogeneity arises when α z = 0. In this case, the partial model and the marginal model are unrelated and Z t is weakly exogenous for a class of parameters of interest, α y , β and γ, in the sense of Engle et al. (1983). See also Johansen (1992aJohansen ( , 1992bJohansen ( , 1995 and HJNR. Maximum likelihood estimation can be performed by analysing the two models separately, i.e., the partial model is estimated by reduced rank regression while the marginal model is by least squares regression. The maintained assumption is that the joint vector X t has r cointegrating relations and hence p − r common trends, with the cointegrating relations being in the partial model for Y t . A notable feature of the setup is that it is left unspecified whether or not Z t is cointegrated. In a one-lag model Z t will not be cointegrated, but with further lags Z t could be cointegrated since the short-run dynamics are determined by both α and Γ; see HJNR (p. 390) for an example of these models. HJNR explored an asymptotic theory for likelihood-based rank testing in the partial model (3). The asymptotic distribution of HJNR's rank test statistic is of the Dickey-Fuller type, now depending on both m − r and p − r, which are the dimensions of common trends for Y t and X t , respectively. Seo (1998) suggested a class of cointegrated models where a stationary regressor, ∆Z t is included in a cointegration model. This corresponds to a models of the type (3), but where X t−1 is replaced by only Y t−1 . In general, this results in an inference that depends on nuisance parameters. Rahbek and Mosconi (1999) noticed that, if the stationary regressor ∆Z t is cumulated and entered in the cointegrating vector X t−1 as in (3), then the asymptotic distributions of HJNR would apply.
Structural breaks in deterministic terms were included in the full system model by JMN. The idea is to consider, say, two sub-samples starting at time T 0 and T 1 , respectively, for 0 = T 0 < T 1 < T 2 = T. The dynamic parameters in the model are the same for both sub-samples, while the parameters for deterministic terms can differ. In the model with lag-length k = 2, the observations T j−1 + 1, T j−2 + 2 for j = 1, 2 are held back as initial observations. Thus, the transition from one regime to the next is not modelled. Recently, Harvey and Thiele (2017) used a similar idea in a structural time series model.

The Partial Model with Structural Breaks
We are in a position to introduce a new model, a partial cointegrated model allowing for structural breaks in its deterministic terms.
We start by defining the timing of the sub-samples. Suppose we have T observations. We extend the partial model to the one with a pre-specified number of sub-sample periods, q say, and k lags. Following JMN, we introduce the sub-sample structure 0 = T 0 < T 1 < · · · < T q = T. The model will have k lags. Thus, for each sub-sample j, the effective range is T j−1 + k < t ≤ T j . In summary, we have data for 0 < t ≤ T, while the effective sample is the collection of effective sub-samples, that is, The model has dynamic parameters that are common across the sub-sample periods, whereas the parameters for deterministic terms vary. This gives, for each effective sub-sample j as defined above: where γ j ∈ R r , µ j = (µ y,j , µ z,j ) ∈ R p and µ y·z,j = µ y,j − ωµ z,j for j = 1, . . . , q, along with Γ i = (Γ y,i , Γ z,i ) ∈ R p×p and Γ y·z,i = Γ y,i − ωΓ z,i for i = 1, . . . , k − 1, and all the other parameters were defined in the previous sub-section. Note that the parameters for deterministic terms depend on j, indicating the presence of parameter shifts according to regime changes. A class of initial observations X T j−1 +1 , . . . , X T j−1 +k plays the dual role of capturing the transition from the previous regime, j − 1 and of serving as the initial observations for the regime j. In some applications, the transition between the regimes may be longer than k observations, in which case more observations could be classified as initial observations. The marginal model for Z t under α z = 0 is We can form a full model equation as in Equation (1) for each sub-sample period. This is the model of JMN with weak exogeneity imposed. This model will be presented in the next sub-section.
The partial model can be formulated as a single equation for the full sample period in terms of the following notation. Following JMN, we define impulse dummy variables as D j,t = 1 for t = T j−1 , 0 otherwise, for j = 1, . . . , q and t = 1, . . . , T, so that D j,t−i = 1 if t = T j−1 + i, and also define indicators for the effective samples as The whole-sample model equation then has the form, with X t−1 = (X t−1 , tE t ) , where the index indicates the model with a linear trend, for t = k + 1, . . . , T, with cointegration rank hypothesis, for r ≤ m, Here, ϕ j,i ∈ R m represents a class of parameters for D j,t−i for i = 1, . . . , k and j = 2, . . . , q, while the parameters γ and µ y·z are now redefined in a manner allowing for breaks as γ = (γ 1 , . . . , γ q ) ∈ R r×q and µ y·z = (µ y·z,1 , . . . , µ y·z,q ) ∈ R m×q , which are used in the rest of this study. Equation (8), or its whole-sample form (10), is referred to as the partial model with a broken linear trend term.

Representations
Various properties of the proposed partial model (8) will be analysed using the Granger-Johansen representation of an I(1) process, which is formulated based on the full model for X t ; thus, the representation is the same as that in JMN (Theorem 2.1). In JMN, each sub-sample period is analysed conditionally on its initial observations. As a result, the representation for each sub-sample period is the same as that in Johansen (1995, Theorem 4.2). The initial values for each sub-sample can be large and thus be influential even in the asymptotic context, but, when following the underlying argument of JMN, one can see that such initial values do not play critical roles in the required asymptotic analysis. Following Kurita and Nielsen (2009), we show this in two steps: first, we analyse a homogeneous equation, and then consider the roles of deterministic terms by moving to a non-homogeneous equation. For further details, see the proof of Theorem 1 below.
For each sub-sample, the full model for X t is defined as a joint system of (8) and (9) through: while the corresponding homogeneous equation is whereX t denotes a p-variate mean-zero vector time series. We then set up a companion vector based on (13) and analyse a companion form of this equation. Several choices are conceivable with respect to a companion form for (13) and we use the choice that appears, for instance, in Hansen (2005). For the purpose of studying details of the representation, the parameters need to satisfy Assumption 1 below. This is applicable to both (12) and (13). Some additional notation is required. When β has full column rank r, let β ⊥ denote a p × (p − r) dimensional orthogonal complement, so that (β, β ⊥ ) is invertible and β ⊥ β = 0 and introduce the normalization β = β(β β) −1 . The same notation applies to α.
Assumption 1. Assume that the roots of the characteristic polynomial, are outside the complex unit circle or at unity; furthermore, assume that the matrices α and β have full column rank r and that the square matrix α ⊥ Ψβ ⊥ has full rank p − r, where Given Assumption 1, we can define C = β ⊥ (α ⊥ Ψβ ⊥ ) −1 α ⊥ This is often referred to as the impact matrix in cointegration literature; see Paruolo (1997) for inference on this matrix.
We are approaching the stage where the Granger-Johansen representation for each sub-sample period is presented. For the homogeneous Equation (13), let us define as well as and ι = (I p , 0, . . . , 0) together with r = r + p (k − 1) . The representation is then given in the theorem below, the proof of which is provided in Appendix B.
Theorem 1. Suppose that Assumption 1 is fulfilled. Then, an r-variate process β X t derived from ation (13) satisfies, on the effective sample T j−1 + k < t ≤ T j for 1 ≤ j ≤ q, which is a stable first-order vector autoregression. The solution to (13) is given as where forX t = X t − τ c,j − τ ,j t with the parameters τ c,j and τ ,j satisfying Ψτ l,j = αβ (τ c,j − τ ,j ) + µ j and β τ ,j + γ j = 0.
Note that the initial observations for the j-th sub-sample in (18) are expressed in terms of linear combinations of the mean-zero valuesX T j−1 +1 , . . . ,X T j−1 +k , so that we can in general argue that the the starting values for each sub-sample period do not play critical roles in asymptotic analysis. This property was not explicitly examined in JMN. Thus, Theorem 1 can be seen as a useful clarification of roles of the initial values in the full cointegrated model subject to deterministic breaks. The Granger-Johansen representation is utilised in proofs of asymptotic theorems in Section 3.
As an alternative to the above sub-sample representation, one can derive a joint representation for the whole sample. For this purpose, we need a full system equation for X t over the entire sample period. This equation is derived from a combination of (12) over j = 1, . . . , q augmented with dummies D j,t−i and E j,t , as in (10); that is, where κ j,i ∈ R p for i = 1, . . . , k and j = 2, . . . , q, and µ = (µ 1 , . . . , µ q ) ∈ R p×q ; see Equation (2.6) in JMN. We then replace the innovations ε t with ε D t = ε t + αγtE t + µE t + ∑ k i=1 ∑ q j=2 κ j,i D j,t−i to reach a whole-sample representation such as where C 1 (L)ε D t denotes a moving-average process whose coefficients decrease exponentially fast, and A depends on initial observations X 1 , . . . , X k , satisfying β A = 0. This is an approximation, since the precise formulation of the moving-average component requires introduction of an infinite past, while the model is formulated as conditional on the initial observations. As before the deterministic parts of the common trends C ∑ T s=k+1 ε D s will be piecewise constant since Cα = 0, so that each constant fails to cumulate to a linear trend. A similar approach was adopted in I(2) cointegration analysis by Kurita et al. (2011). The representation (20) is clear and concise, but the transition from one regime to another is considered to be explicitly autoregressive, which may leave less flexibility to represent regime transitions of some persistent and messy nature. For the asymptotic study conducted below, we follow JMN by using the sub-sample representation (18).

The Partial Model with Shifts in The Level
In some applications, it suffices to exclude the broken linear trends and just include shifts in the constant term. By restricting the broken constant term within the cointegrating space, Equation (10) is reduced to, with X c t−1 = (X t−1 , E t ) and index c for model with breaks in the constant level, and cointegration rank hypothesis, for r ≤ m, The Granger-Johansen representation has the same form as (18) but is subject to β τ c,j + γ j = 0 and τ ,j = 0.

Testing for Cointegrating Rank in the Partial Models
This section addresses the issue of testing for cointegrating rank in the suggested partial models with deterministic shifts. Section 3.1 introduces a partial likelihood ratio test for the choice of rank based on the broken linear-trend model and Section 3.2 derives its limit distribution. Section 3.3 then turns to the broken constant model and examines the test statistic based upon it. Finally, Section 4 derives a class of approximations to the limit distributions by means of computer simulations and response surface regression.

Rank Test Statistic
For each sub-sample period, the partial model (10) is seen as equivalent to that in HJNR, given the presence of structural breaks in its deterministic terms. We derive the log partial likelihood ratio test statistic for the cointegration rank hypothesis H (r) defined in (11). This likelihood is analysed by reduced rank regression in a manner similar to the original cointegration model in Johansen (1988Johansen ( , 1995. We show that the reduced rank regression can be done in three, numerically equivalent ways. The first approach is based on a full-sample reduced rank regression. Regress each of the vectors ∆Y t and X t−1 = (X t−1 , tE t ) on a vector H t consisting of the variables ∆Z t , the lagged differences ∆X t−1 , . . . , ∆X t−k+1 , the intercepts E t and the impulse dummies D j,t−i for i = 1, . . . , k and j = 2, . . . , q, so that H t has dimension pk − m + q + k(q − 1). This gives residuals R 0,t and R 1,t : The second approach is viewed as a sub-sample approach. We note that the impulse dummies result in a perfect fit for each transitional period in between two connecting regimes; thus, R 0,t and R 1,t are zero for all transitional periods. We can, therefore, compute the residuals R 0,t and R 1,t by analysing the effective sub-sample periods only; see also Doornik et al. (1998, §12.2). For this purpose, let us form regressors P t from the variables ∆Z t , the lagged differences ∆X t−1 , . . . , ∆X t−k+1 and the intercepts E t , so that P t is a vector of dimension pk − m + q. The residuals R 0,t and R 1,t then satisfy The third approach is recognised as a two-step approach, in which we first demean the observed time series and then partial out influences from the lagged differences. In the first step, we analyse two vectors ∆Y t and X t−1 , along with a vector V t consisting of the variables ∆Z t and the lagged differences ∆X t−1 , . . . , ∆X t−k+1 . These three vectors are demeaned within each sub-sample period, yielding Z 0,t , Z 1,t and Z 2,t defined as for 1 ≤ j ≤ q and T j−1 + k < t ≤ T j and zero otherwise. In the second step, we compute Since Z 0,t , Z 1,t are zero within the transitional periods, so are the residuals R 0,t , R 1,t .
Hence, the log partial likelihood ratio (PLR) test statistic for the null hypothesis of cointegrating rank r, H (r), against the hypothesis H (m) is

Asymptotic Distribution of the Test Statistic
We derive the asymptotic distribution of the rank test statistic in a setting where the relative break points satisfy T j /T → v j for j = 0, . . . , q while T goes to infinity. The relative break points satisfy 0 = v 0 < v 1 < · · · < v q = 1. Indeed, with int(x) denoting the integer part of x, then the q-vector E int(Tu) on u ∈ [0, 1] has the limit In the standard framework developed by Johansen (1995), the innovation sequence ε t is assumed to be independent and identically Gaussian distributed, and this assumption was adopted by HJNR and JMN as reviewed in Section 2.1 above. We relax this normality assumption to a martingale difference assumption. If the innovations ε t are not normal, the model equations lead to a quasi-likelihood function rather than a likelihood function. Weak exogeneity is preserved as it is a property of the likelihood rather than the distribution of the innovations as such. The partial innovation ε y·z,t = ε y,t − ωε z,t and the marginal innovation ε z,t are uncorrelated, but they will not be independent in general when moving away from the normality assumption. We can no longer appeal to the conditional-distribution argument, as implied in Equation (3). Thus, the conditional-distribution argument is replaced with a regression argument, see Appendix C.2. The martingale difference assumption is summarised as Assumption 2 below.
Assumption 2. Assume that ε t is a martingale difference sequence with respect to a filtration F t such that E(ε t |F t−1 ) = 0 almost surely (a.s.). Let Ω be a positive definite matrix. Suppose that The boundedness conditions in part (iii) are not nested. Part (a) can be satisfied without the existence of fourth moments as in part (b). Conversely, bounded fourth moments in part (b) do not necessarily imply part (a); see Remark A1 in Appendix C.1.
Under Assumption 2, we are able to apply the results of Brown (1971) to analyse the random walk components of the process. For this, we require a Lindeberg condition, which is established in Lemma A1 in Appendix C.1 under Assumption 2. Brown's result is for univariate martingale difference sequences and requires that the ratio of the sum of conditional variances to that of unconditional variances should converge to unity. For the multivariate case, we can apply the Cramér-Wold device and form linear combinations of the present multivariate martingale differences. Using parts (i), (ii) we can then show that Brown's ratios converge to unity.
Under Assumption 2, we can also analyse the (approximately) stationary components of the process. Under part (iii.a), we can apply the results of Anderson and Kunitomo (1992), which exploit a truncation argument. Under part (iii.b) we can apply the same ideas as in Anderson and Kunitomo (1992) but without the truncation argument.
Cointegration models with heteroscedasticity have previously been analysed by for instance Cavaliere et al. (2010) and Boswijk et al. (2016). The former paper is concerned with rank testing in a full system. For the analysis of the (approximately) stationary components, it relies on Hannan and Heyde (1972) who require an almost sure version of Assumption 2(ii). The latter paper is concerned with testing on the cointegrating vectors in a full system with an elaborate, deterministic structure for the variances of the innovations.
Before proceeding to the main results, we present a set of stronger assumptions requiring constant conditional variance, see Assumption 3 and Lemma 1 below. These assumptions were used by Wei (1982, 1985) as well as Chan and Wei (1988). They have two advantages. First, they are easier to check for practitioners than the convergence results in Assumption 2. Second, the assumptions could be used to derive a variety of almost sure convergence results, as explored by Wei (1982, 1985) and Nielsen (2005), although we will not exploit those properties here.
Assumption 3. Assume that ε t is a martingale difference sequence with respect to a filtration F t such that E(ε t |F t−1 ) = 0 a.s. and

Lemma 1. Assumption 3 implies Assumption 2.
We can now present the limit distribution of the PLR statistic (25), noting that, formally, it is a a log partial quasi-likelihood ratio test statistic under Assumption 3. For this purpose, let D → signify weak convergence, while let B u represent a (p − r)-dimensional standard Brownian motion process on u ∈ [0, 1] and let B (m−r) u be the first m − r coordinates of B u . The limit distribution is given in the next theorem, which is proved in Appendix C.3.
Theorem 2. Suppose that Assumptions 1 and 2 are satisfied along with α z = 0, so that Z t is weakly exogenous with respect to α y , β and γ. As T → ∞, with relative break points satisfying where, with e u defined in (26), Note that, when p = m, the result in Theorem 2 corresponds to Theorem 3.1 in JMN. A direct simulation of (27) is rather laborious. By exploiting some analytic properties of the distributions, we are able to simplify this simulation task. The next Theorem 3 describes these properties by linking the moments of the limit distribution in Theorem 2 to those for the full model. Theorem 3 provides a basis for simulation in Section 4. The proof of this theorem, given in Appendix C.3, is based on a slight modification of results in Doornik (1998, §9); see also Boswijk and Doornik (2005).
Theorem 3. Let B i,u be the i-th coordinate of the Brownian motion B u . Let Then, T 1 , . . . , T p−r are identically distributed and any pairs T j , T k are also identically distributed. Moreover, the limiting statistic (26) T i with expectation and variance given by Since the above Theorem 3 links the moments of the statistics for partial systems and for a full system, we can now proceed by simulating distributions for full systems only. JMN simulated response surfaces for the mean and variance of ∑ p−r i=1 T i . As we will also need a response surface for Cov(T 1 , T 2 ), we have to redo their simulation exercise. For this purpose, we quote a result from JMN.
Theorem 4 (JMN, Theorem 3.2). Let B [1] , . . . , B [q] be independent (p − r)-dimensional standard Brownian motions and define Then, the limiting variable (26) for a full sample with p = m satisfies Here, the two summands are independent and ∑ q j=1 J j J j is distributed as j denote the ith coordinate of J j and K j so that As in JMN, we note that Theorem 4 implies a simple relation between the limiting statistics for models with q and with q − 1 sub-sample periods that is where DF (p − r, p − r; v 1 , . . . v q−1 ) and J q J q are independent and J q J q is χ 2 (p − r).

Asymptotic Distribution for the Broken Constant Case
The model investigated previously has a broken linear trend. A variant of this model is free of such a linear trend but with a broken constant; see Equation (21). Equation (8) then reduces to As before, the partial quasi-likelihood is maximized by reduced rank regression. We follow the third approach (23) in Section 3.1, in which the broken linear trend is now replaced with the broken constant, so that we consider the vectors ∆Y t and X t−1 = (X t−1 , E t ) , together with the vector V t composed of the variables ∆Z t and the lagged differences ∆X t−1 , . . . , ∆X t−k+1 . Equation (23) then reduces to The limit distribution of the log partial quasi-likelihood ratio test statistic for cointegrating rank r, denoted by PLR{H c (r)|H c (m)}, is given in Theorem 5 below. Its proof is based on a set of modifications of the proofs for the limit theorems in the previous sub-sections.
Theorem 5. Suppose that Assumptions 1 and 2 are satisfied along with α z = 0, so that Z t is weakly exogenous with respect to α y , β and γ. As T → ∞, with relative break points satisfying where DF c is defined as in Theorem 2 with the difference that The results in Theorems 3 and 4 also apply with the present choice of G u .

Approximations of the Asymptotic Distributions
The limit distributions of the cointegrating rank test statistics are non-standard, as shown in the previous sub-sections; however, given the existing results in the literature, the distributions can be closely approximated by a gamma distribution identified by the first two moments. We first derive this approximation and then show how to implement the approximation.

Derivation of Response Surface
The literature shows that the asymptotic distributions for cointegration rank testing are nearly gamma distributed. The approximating gamma distribution can be captured either through the mean and variance of the asymptotic distribution or through the associated shape and scale parameters. The quality of the gamma-distribution approximation method has been documented in several papers. Using analytic methodology, Nielsen (1997) showed a very good agreement between limit distributions and approximate gamma distributions in tests for unit roots. Doornik (1998) then conducted detailed simulation studies to demonstrate a similar agreement for standard full-system cointegration rank test statistics; see also Doornik (2003) for various tables of asymptotic quantiles produced by the gamma-distribution approximations. JMN also employed this method.
In order to apply the gamma approximation method, we first define parameters for shape and scale. By Theorem 3, the partial system statistic satisfies DF (m − r, p − r; v) = ∑ m−r i=1 T i , where the statistics T i are identically distributed and also the pairs T i , T j are identically distributed. Thus, we get Solve for E(T 1 ) and Var(T 1 ) when m = p and insert above to get Thus, it suffices to approximate the moments of the full sample distributions through simulation. Numerically, it appears that better approximations arise when approximating shape and scale parameters instead of mean and variance. We therefore write From this, we get the shape and scale parameters as Hence, we simulated λ p−r , δ p−r and Cov(T 1 , T 2 ) and constructed response surfaces to approximate the distribution of DF (m − r, p − r; v). Following JMN and Doornik (1998), we applied a variety of data generating processes and present the results using response surface analysis.
The quantities λ p−r , δ p−r and Cov(T 1 , T 2 ) were simulated for a set of given p − r, T and relative break points. Following JMN, we chose q = 3 as the maximum number of sub-samples, with a and b representing the smallest and the second smallest of relative sample lengths, respectively. For example, if q = 2 along with v 1 ≤ 1 − v 1 , we then have a = 0 and b = v 1 . The grid points a and b were selected in the same way as those for Figure 1 in JMN e.g., (a, b) = (0, 0), (0, 0.05), (0, 0.1), · · · , so that they were subject to the constraints of a ≤ b and b ≤ (1 − a − b) and the total number of their combinations was 20, along with the selection of non-stationary components p − r = 1, . . . , 8. For the overall sample sizes or Ts, JMN used 10 integers derived from 500/i for i = 1, . . . , 10, but we quadrupled them in order to improve approximations to the underlying limit distributions of the response variables. Thus, we obtained a new set of 10 sample sizes, Ts, ranging from 200 to 2000. For log λ p−r and log δ p−r , this simulation design led to 1600 (= 20 × 8 × 10) cases, while the number of cases was reduced to 1400 for Cov(T 1 , T 2 ) as a result of missing values corresponding to p − r = 1.
The computational algorithm used in our study was based on Theorem 4. These asymptotic results justify simulating three sets of T-step random walks for broken linear-trend and constant cases and scaling them according to the pre-specified relative sample lengths. The number of simulation replications N was set at 100,000.
For the response surface analysis, we used log λ p−r , log δ p−r and Cov(T 1 , T 2 ) as the response variables, instead of the logged means and variance as in JMN. It turns out that the use of these response variables (log λ p−r , in particular) mitigates the residual heteroscedasticity problem, hence resulting in a reduction of the number of indicator variables required for p − r = 2 and p − r = 1. Note that Cov(T 1 , T 2 ) needs to be included in the set of response variables in any response surface study, in order to make use of Equation (30). In addition, note that taking the log of Cov(T 1 , T 2 ) is not permissible, since covariance is not always positive.
Compared to JMN, we increased the maximum number of observations from T = 500 to T = 2000. It was found that the large-sample (T ≥ 1000) approximates of the mean and variance in small dimensions (p − r ≤ 3) tend to be rather different from those when T is small. This finding is consistent with Doornik (1998), who introduced a set of indicator variables being assigned 1 for p − r = 2 and p − r = 1 and assigned 0 otherwise; these indicators put residual heteroscedasticity under control even in the presence of influential values for p − r = 2 and p − r = 1.
We regressed each of the three response variables, log λ p−r , log δ p−r and Cov(T 1 , T 2 ) on a set of regressors formed from a, b, p − r and T. Our baseline function form was a modified version of Equation (3.11) in JMN. In the context of the present paper, the equation in JMN is expressed as where y is either log λ p−r , log δ p−r or Cov(T 1 , T 2 ), while z 1 = p − r, z 2 = a, z 3 = b, z 4 = T −1 and d m = (p − r) −m . Following Doornik (1998), we also added to this equation a set of indicator variables as explanatory variables, each of which is 1 for a selected value of dimension p − r and is 0 otherwise. Performing a series of regression analyses and carefully removing insignificant explanatory variables by utilising the Autometrics option available in PcGive (Doornik and Hendry 2013), we arrived at parsimonious response-surface functions for log λ p−r , log δ p−r and Cov(T 1 , T 2 ); these functions are henceforth denoted f z p−r (p − r, a, b, T) with z taking values λ, δ and cov, respectively. Tables A1 and A2 in Appendix A record the rounded coefficients for a, b, p − r and their variants in the response surface regression for the broken linear trend case and the broken constant case, respectively. The inverse of the observation number, T −1 , and its variants such as T −2 , also play critical roles in the response surface regression, but all of them are irrelevant asymptotically and thus disregarded when calculating the limit approximates based on these tables.
It should also be noted that a response-surface regression analysis of Cov(T 1 , T 2 ) was technically difficult in terms of residual diagnostic tests. Doornik (1998) used the average of estimates for Cov(T 1 , T 2 ) when performing a response surface analysis for partial systems with no break. We adhered to the regression approach, rather than simply taking the average of the covariance estimates, by assigning importance to various significant influences of a, b and p − r on the behaviour of Cov(T 1 , T 2 ). This regression analysis indeed bore fruit and clarified the highly complex structure of the dependence of Cov(T 1 , T 2 ) on a, b, p − r and its variants, as shown in the third column of each of Tables A1 and A2. These findings about Cov(T 1 , T 2 ) are not known in the literature, thus giving added value to the response surface study conducted in this paper, although the impact of variation in Cov(T 1 , T 2 ) on the approximate shape and scale parameters may not always be large.
Tables 1 and 2 display a set of examples demonstrating the accuracy of the response surface regression results. A class of approximately 95% limit quantiles is presented in each of the tables for various combinations of a, b, p − r and m − r, when either broken-linear-trend or broken-constant specifications are adopted in analysis. Approximate quantiles in the fifth column (q 95 ) in Tables 1 and 2  are derived from Tables A1 and A2, respectively; that is, they are from the full-system-based response surface analysis, combined with the mappings (29), (30) and (31). By contrast, approximate quantiles recorded in the sixth column (q * 95 ) of each table, except those for a = b = 0, were obtained directly from auxiliary response surface regressions based on partial-system simulations with the same Ts and N as above. Each of these auxiliary regression equations employed a simulated 95% quantile as a response variable and involved a constant, T −1 and its powers if necessary, as explanatory variables. The regression equations vary in specification for the purpose of capturing the underlying smooth response surfaces of various simulated quantiles; the graph of each regression's actual and fitted values was checked to ensure the capturing of the underlying smoothness. Estimated constants in these regression equations are recorded in the columns for q * 95 as approximate 95% limit quantiles. The limit quantiles in q * 95 for a = b = 0 (that is, no break cases) were taken from Doornik (2003). Tables 1 and 2 show that the quantiles in q 95 almost coincide with those in q * 95 regardless of specifications of the deterministic terms; see the seventh column of each table for q 95 /q * 95 − 1 , a series of absolute relative errors, all of which are very small. This correspondence can be seen as strong evidence supporting the validity of the proposed approximation method based on the full model. Furthermore, the eighth column of each table records a class of discrepancies in approximate p-values, defined as ∆p app = g q 95 q 95 − q * 95 , in which g (·) represents a gamma density function calculated from simulated mean and variance. Most of the discrepancies are very small, and even the largest one is around 0.02 when p − r is relatively large, for which we should recall that a large value of p − r could give rise to various other distortion issues in practice. The overall evidence allows us to argue that the approximate quantiles work as useful critical values in applications from a practical viewpoint. The Supplementary Materials includes an Ox code for simulating asymptic distribution. This can be used if further precision is needed.
As a caveat in relation to large values for p − r, let us recall that our response surface regression was conducted by using a class of realistic number of non-stationary variables, p − r = 1, . . . , 8, which suffice in most applied research. Thus, an empirical study using a partial system of large dimension may require careful examinations of the underlying cointegrating rank, in addition to the application of the proposed PLR tests to the data under study, as discussed by Juselius (2006, §8). Notes. q 95 denotes 95% limit quantiles approximated from the full systems, while q * 95 denotes those calculated directly from the partial systems. ∆p app represents discrepancies in approximate p-values. Notes. q 95 denotes 95% limit quantiles approximated from the full systems, while q * 95 denotes those calculated directly from the partial systems. ∆p app represents discrepancies in approximate p-values.

Implementation of Response Surface
The response surface in Tables A1 and A2 are used as follows. The response surface is aimed at the situation with two breaks. However, Theorem 4 shows that with a simple correction the response surface can also be used with a single break or no break.
In the case of q = 3 sample periods and thus 2 breaks at T 1 , T 2 , where 0 < T 1 < T 2 < T, we let a, b be the smallest and second-smallest relative sub-sample length. Thus, if v In the case of q = 2 sample periods and thus 1 break at T 1 , where 0 < T 1 <, then v 1 = T 1 /T, v 2 = (T − T 1 )/T, so that v 1 + v 2 = 1. We let a = 0 and b = min(v 1 , v 2 ).
In the case of q = 1 sample period and thus no break, let a = b = 0. Theorem 4 and (28) show that the mean and variance for the cases where q < 3 can be found from those for q = 3 by choosing a, b as indicated and subtracting (3 − q)(p − r) and 2(3 − q)(p − r), respectively.
Given the choices of p − r, m − r, a and b, compute the approximations to Table A1 is used for the case with a broken linear trend while Table A2 is used for the case with a broken constant. This is then inserted in (31), which in turn is inserted into (29), (30), while correcting for the number of breaks, that is, Finally, we approximate the quantile of interest or the p-value of the observed PLR statistic using a gamma distribution with mean and variance matching (33) and (34). Equivalently, one can specify the shape and scale of the gamma distribution as mean 2 /variance and variance/mean.
A spreadsheet for implementing the response surfaces in Tables A1 and A2 is available in the Supplementary Materials. This also includes an Ox program for simulating the asymptotic distributions and calculating p-values of observed test statistics for specifications outside the range covered by Tables A1 and A2, for instance when the number of structural breaks is greater than 2 or q > 3.

Empirical Illustration
As empirical illustration, we analyse a set of quarterly time series data from Schreiber (2015), who attained an econometric system for the exchange rate and bilateral trade between the UK and Germany. She decomposed the UK-Germany economic system into two blocks, a foreign exchange block and bilateral trade block, in order to obtain a data-congruent representation useful for forecasting and policy analysis. Various econometric studies were conducted by Schreiber (2015), and one of them was the analysis of a partial model for the bilateral trade block with a structural break. The methodology developed in the above sections enables us to conduct formal tests for cointegrating rank that underlies such a partial system subject to a break. This partial system analysis may also be encouraged in terms of local power advantage of partial-system-based tests over those based on a full system under weak exogeneity, as demonstrated by Doornik et al. (1998) as well as Kurita (2011). Figure 1 presents an overview of the quarterly data spanning the sample period of the first quarter in 1991-the second quarter in 2014, denoted as 1991.1-2014.2 hereafter. The variable tb t is the trade balance between the UK and Germany, i.e., the difference between the log of exports of UK goods to Germany and the log of imports of German goods to the UK; dulc t represents the unit labour cost differential between the two countries; y t and y * t denote the logs of the UK and German gross domestic products, respectively; ppp t represents the terms of trade in logarithm. See Schreiber (2015) for further details of the data. The figure indicates the presence of a structural break around 2008-2009 attributable to a global economic recession over this period. (a) tb t is the trade balance between the UK and Germany; (b) dulc t is the unit labour cost differential between the UK and Germany; (c) y t and y * t are the logs of the UK and German gross domestic products, respectively; (d) ppp t is the terms of trade.
In this empirical illustration, we analyse the data using a bivariate partial autoregressive model for tb t and dulc t , with y t , y * t and ppp t assumed to be weakly exogenous for the class of parameters of interest such as cointegrating vectors; that is, p = 5 and m = 2. This assumption is based on Schreiber's study, suggesting that modelling the bilateral trade block centering on tb t and dulc t appears to be conformable to the underlying data structure. The lag-length k = 2 is selected for our bivariate partial autoregressive model.
With regard to the issue on a structural break, we adopt a broken trend specification; that is, the presence of a shift in the restricted trend as well as the unrestricted constant. The second sub-sample period starts in 2008.3, corresponding to the observation point in T q−1 for q = 2, which results in the selection of relative break points a = 0 and b = 0.255. According to (10), our bivariate model with k = 2 requires a set of two impulse dummy variables for the initial values of the second sub-sample periods. In addition, a pair of impulse dummy variables, Dp1998(1) and Dp2006 (2), is employed in our model to capture outliers in the data, as in Schreiber (2015); the former variable is 1 in 1998.1 and zero otherwise for an outlier due to the Asian financial crisis, while the latter is 1 in 2006.2 and zero otherwise, corresponding to an outlier attributable to an increase in oil prices.
A set of residual diagnostic tests for the partial system is reported in Table 3. Most of the test statistics are given in the form F j (d f 1, d f 2), which denotes an approximate F test (with relevant degrees of freedom d f 1 and d f 2) against the alternative hypothesis j. The alternative hypotheses are specified as: 5th-order serial correlation (F AR5 , Godfrey 1978), 4th-order autoregressive conditional heteroscedasticity (F ARCH4 , Engle 1982), heteroscedasticity (F HET , White 1980). Chi-squared tests for normality (χ 2 ND , Doornik and Hansen 2008) are also recorded in the table. We also note the following caveats based on recent advances in the field of mis-specification tests: Nielsen (2006) demonstrated that F AR5 is a valid test in the presence of unit roots; Berenguer-Rico and Wilms (2018) showed that F HET is valid after eliminating outliers from the observations, while χ 2 ND is not necessarily valid after the removal of outliers, which was demonstrated by Berenguer-Rico and Nielsen (2017). In any case, no evidence is found in Table 3, suggesting significant mis-specification problems. We can thus judge this partial system is formulated sufficiently well to be subjected to PLR tests for cointegrating rank.  Table 4 presents a class of PLR test statistics for the determination of cointegrating rank, along with the corresponding p-values and approximate 95% limit quantiles calculated from the response surface outcomes in the previous section. We used Table A1 in Appendix A to calculate approximates to log δ p−r , log λ p−r and Cov(T 1 , T 2 ), and then applied them to the mappings (29) and (30) adjusted for extraχ 2 terms, so that the gamma-distribution approximation method yielded the p-values. Table 4 shows that, at the 5% level, the null hypothesis r = 0 is rejected while the hypothesis r ≤ 1 fails to be rejected. Hence, this formal analysis enables us to reach the conclusion of r = 1, which supports the informal analysis of Schreiber (2015).
where a figure in brackets under each coefficient is a standard error and υ t is a stationary error. The signs of the coefficients in (35) are the same as those in Schreiber (2015)'s cointegrating equation except for y * t . The German income y * t was insignificant in her cointegrating relationship and thus removed from it, while, in (35), y * t plays a significant role, along with y t . As a result of checking a set of unrestricted estimates for the cointegrating vector, we have arrived at Equation (35), where the coefficients of y t and y * t are restricted to add to zero, while a zero-restriction is placed on the coefficient for t1 (≤2008:2) ; that is, a linear trend is present only in the second sub-sample period. The PLR test statistic for these restrictions is 3.571[0.168], in which the figure in square brackets is a p-value according to χ 2 (2). Thus, the hypothesis of the overall restrictions cannot be rejected at the 5% level.
There are several interesting aspects of Equation (35) that are worth discussing here. The real income difference between Germany and the UK, y * t − y t , has a positive coefficient, implying that a spread in the income difference leads to an improvement in the UK trade balance with Germany. This finding is interpretable in the context of an income effect from each of the two countries. The coefficient for the terms of trade, ppp t , should also be noted. It is negative, thus indicating a relative price effect on the trade balance in a theory-consistent manner; that is, a decrease in exports prices relative to import prices leads to trade balance improvement, so that the well-known elasticity approach to trade balance appears to be empirically valid for the two countries. Furthermore, the linear trend t is significant solely in the second sub-sample period, suggesting long-lasting influences of the global recession on the two countries' trade balance and other economic variables.
Finally, we will check that the three variables, y t , y * t and ppp t , are indeed weakly exogenous for the class of parameters of interest. We follow the testing procedure suggested by Johansen (1992a), Boswijk (1992) and HJNR. First, the restricted cointegrating combination is added as a regressor to a marginal system (9) for Z t = (y t , y * t , ppp t ) . Second, a standard regression analysis is performed to test for the significance of the cointegrating combination in each equation. Table 5 reports a class of LR test statistics for the exclusion of the empirical cointegrating linkage from each equation in the marginal system. Judging from the reported p-values according to χ 2 (1), none of the test statistics indicate evidence against the assumption of weak exogeneity; thus, the preceding partial-system analysis of cointegrating rank has been justified.

Conclusions
This study has explored partial cointegrated vector autoregressive models subject to structural breaks in deterministic terms, a linear trend and constant. The Granger-Johansen representation of the full model in JMN has been reexamined, leading to a useful clarification of roles of the initial values in asymptotic analysis. A class of log likelihood ratio test statistics for cointegrating rank has then been introduced in the proposed partial-model framework. We have investigated asymptotic theory under a general class of innovation distributions allowing martingale difference sequences with conditional heteroscedasticity. The derived limit distributions of the statistics are closely related to those for the full models investigated by JMN. This relationship allows us to perform a response surface analysis in a simplified full-system framework, instead of relying on laborious partial-system-based simulations. The outcomes of the analysis are summarised as a set of two statistical tables providing valuable information for inference on the underlying cointegrating rank. Lastly, an empirical analysis of real-life data from the UK and Germany has demonstrated the practicality of these tables in applied economic research. As a result of this study, the partial cointegrated models have become more flexible and reliable devices for modelling time series data subject to various structural breaks.
Recently, bootstrap methods have been proposed for cointegration rank testing in full systems (Cavaliere et al. 2012). It would be interesting to extend those to partial systems with or without breaks.

Supplementary Materials:
The following are available online at http://www.mdpi.com/2225-1146/7/4/42/s1: A preadsheet for implementing the response surface in Tables A1 and A2, as well as an Ox program for simulating the asymptotic distributions.

Author Contributions: The authors made equal contributions.
Funding: T. Kurita gratefully acknowledges financial support for this work from JSPS KAKENHI (Grant Nos: 15KK0141 and 26380349). Surfaces   Table A1. Response surfaces for broken trend models. log λ p−r log δ p−r Cov(T 1 , T 2 ) const. 4.14 const. 0.5987 const. −1.298

15.93
Note: 1 (x) is 1 when p − r = x and zero otherwise.
Note: 1 (x) is 1 when p − r = x and zero otherwise.

Appendix B. Proof of the Granger-Johansen Representation
This section provides a proof of Theorem 1, in which the Granger-Johansen representation of the full model with deterministic breaks is presented.
We then follow Kurita and Nielsen (2009) in the analysis of non-stationary components. Start by the homogenous Equation (13) for 1 ≤ j ≤ q and T j−1 + k < t ≤ T j : Pre-multiplying the above equation by α ⊥ and replacing ∆X t−i = ∆X t − ∑ i−1 =0 ∆ 2X t− , we collect repeated terms ∆X t on the left-hand side to find Apply the orthogonal projection identity α ⊥ ΨX t = α ⊥ Ψβ ⊥β ⊥X t + α ⊥ Ψββ X t to the left-hand side and then pre-multiply both sides by β ⊥ (α ⊥ Ψβ ⊥ ) −1 to find the C matrix. Shifting CΨββ X t to the right hand side, we arrive at Addingββ X t on both sides results iñ Using the notation Υ i = −Γ i − · · · − Γ k−1 for 1 ≤ i ≤ k − 1 as well as the matrix Λ defined in (14) leads to the first desired result (17).
Limit Theorem and a convergence to a stochastic integral for the non-stationary components of the full model. We formulate these as a the following high level assumption and then prove that it is satisfied under Assumptions 1 and 2.
Assumption A1. Let ε t be a p-dimensional random variables and suppose that Assumption 1 is satisfied. LetX t satisfy the homogenous Equation (13) and define U t−1 as Suppose that where Ω and Σ u are positive definite matrices. Furthermore, let W u be a p-dimensional Brownian motion with variance Ω. Suppose that, for 0 ≤ u ≤ 1, as a process on (D[0, 1]) p endowed with the Skorokhod metric with common distortion. Finally, The next result explores the conditions of Brown (1971). Subsequently, we use this to show that Assumptions 1 and 2 imply Assumption A1.
Second, suppose Assumption 2(iii.b), so that sup t∈N E|ε t | 4 < ∞. Chebychev's inequality gives Next, by the Cauchy-Schwarz inequality and Assumption 2(iii.b), we get T is symmetric, the spectral norm equals the spectral radius, thus it suffices to show that v (U 2 T − V 2 T )v vanishes for any linear combination v. In turn, it suffices to consider univariate martingale difference sequences ε t .
First, suppose Assumption 2(iii.a) holds, so that sup t∈N E{ε 2 t 1 (ε 2 t >a) |F t−1 } = o P (1) as a → ∞. We follow an argument inspired by Anderson and Kunitomo (1992, Theorem 2). Hall and Heyde (1980, Theorem 2.23 To prove the tightness condition, note that Proof of Lemma A2. Note that the process U t equals β X t , which is studied in Theorem 1. It satisfies Equation (16), which is of the form U t = ΦU t−1 + Fε t with Φ = (I r + β α) for r = r + p (k − 1) and F = β ι, where Φ has spectral radius less than unity as verified in Theorem 1. For (A3), we apply Anderson and Kunitomo (1992, Lemma 1). This requires that max 1≤t≤T |ε t | 2 = o P (T), which is proved in Lemma A1 using Assumption 2.
For (A4), use Assumption 2(iii.a). We apply Lemma 2 in Anderson and Kunitomo (1992) to show the convergence of the product moment matrix, which requires Assumption 2(ii, iii.a). Assumption 2 states that Ω is a positive definite matrix, which results in the positive definiteness of Σ u by Anderson and Kunitomo (1992, Lemma 3).
For (A4) using Assumption 2(iii.b). We follow Anderson and Kunitomo (1992, Lemma 2) but avoid their truncation argument. We first argue that ∑ T t=1 U t−1 Fε t = o P (T). Since U t−1 Fε t is a martingale difference sequence with second moments due to Assumption 2(ii) and the spectral norm is bounded by the trace, we obtain Applying iterated expectations and using that max 1≤t≤T E|ε t | 2 is bounded by Assumption 2(iii.b) gives E ≤ C ∑ T t=1 E|U t−1 | 2 . Noting that U t = ∑ t−1 j=0 Φ j Fε t−j + Φ t U 0 and using that ε t is a martingale difference array, we arrive at Using that max 1≤t≤T E|ε t | 2 is bounded and Φ has spectral radius less than unity, As a consequence E = O(T) and, by the Markov inequality, ∑ T t=1 U t−1 Fε t = o P (T). Next, we Here, the second term converges to FΩF by Assumption 2(i) while the last two terms vanish. Since max 1≤t≤T |U t | = o P (T 1/2 ), we therefore get Anderson and Kunitomo (1992, Lemma 3) show invertibility of Σ U .

This is a linear equation in
For (A5), the Functional Central Limit Theorem follows from the univariate result of Brown (1971, Theorem 3), equipped with Cramér-Wold device, see Billingsley (1968, Theorem 7.7). Brown's result applies under Assumption 2(i, ii) and either of the Lindeberg condition established in Lemma A1(b, c) under Assumption 2. When using Brown's result, it is convenient to define the univariate variables S t = ∑ t s=1 ε s and s 2 t = ∑ t s=1 Eε 2 s . Brown is concerned with the continuously embedded random walk through the points (s 2 t /s 2 T ), (S t /s T ), while we are concerned with the right continuous random walk that is constant (S t /T 1/2 ) on the half-open intervals [t/T 1/2 , (t + 1)/T 1/2 ). The two embeddings reconcile since s 2 t /s 2 T = tσ 2 for some constant σ 2 by Assumption 2(i) and since max 1≤t≤T |ε|/T 1/2 vanishes by Lemma A1(d) under Assumption 2.
For (A6), the convergence to a stochastic integral for the univariate case is based on the results of Jakubowski et al. (1989), which was referred to by Kurtz and Protter (1996), while the convergence to a stochastic integral for the multivariate case is based on the results of Kurtz and Protter (1991). For the univariate case, Kurtz and Protter (1996, Theorem 7.1) show that we need to check that the martingale array M T t = T −1/2 ∑ t s=1 ε s is uniformly tight, as required by Jakubowski et al. (1989) or, equivalently, it has uniformly controlled variations. We use Kurtz and Protter (1991, Theorem 2.2), which applies to the multivariate case; see also Hansen (1992, Theorem 2.1). Choose δ = ∞ so that M T,δ t = M T t in Kurtz and Protter's notation. For each α > 0, T ≥ 1, choose stopping times τ T,α = ∞ so that P(τ T,α ≤ α) = 0 ≤ 1/α. Then, we obtain a quadratic variation processes so that M T t has uniformly controlled variations.

Appendix C.2. Several Lemmas for the Partial Systems
The asymptotic properties of the product moment matrices,S ij for i, j = 0, 1 defined in (24), are investigated so as to adapt Lemmas 10.1 and 10.3 of Johansen (1995) to the present model. We do this by combining various ideas and techniques from HJNR, JMN and Kurita and Nielsen (2009). These papers assume normal innovations, which we have generalised as Assumptions 2 and 3 in our study. This means that we have to be careful when defining the limits of product moments of various non-integrated components. This issue is addressed in the following lemma: Lemma A3. Suppose that Assumptions 1 and A1 are satisfied. Let Let v j = T j /T be relative break points for j = 0, . . . , q and define the sample product moment matrix of V t corrected for Q t and a constant as Then, as T → ∞ with fixed relative break points v j , we get that M V V ·Q,1 converges in probability to a positive definite matrix with the structure where Σ xβ = αΣ ββ and Σ xx = αΣ βx + Ω hold.
Proof of Lemma A3. We start with the homogenous Equation (16). For 1 ≤ j ≤ q and T j−1 + k < t ≤ T j , this equation can always be solved as and, in the first sub-sample period or j = 1, the initial value β X T 0 +k = β X k can be treated as fixed, so that the process β X t for T 0 + k < t ≤ T 1 becomes uniformly bounded in probability by noting that it equals U t in Assumption A1. Similarly, by iterating over all the other start-up values, the process β X t for T j−1 + k < t ≤ T j and j = 2, . . . , q is also uniformly bounded in probability. Since the number of breaks is finite, β X t is uniformly bounded in probability jointly for 1 ≤ j ≤ q. Next, the Granger-Johansen representation (18) implies that, for 1 ≤ j ≤ q and T j−1 + k < t ≤ T j , Since β X t is uniformly bounded in probability, it follows that U t is also uniformly bounded in probability. Note that U t is identical throughout all sub-sample periods. The intercepts τ ,j and β τ c,j are eliminated from ∆X t and β X t + γ j t respectively, when demeaning them within each sub-sample period. Consequently, we can apply the Law of Large Numbers (A4) in Assumption A1 to which, for 1 ≤ j ≤ q, converges in probability to a positive definite matrix denoted as Since both X t−1 and Q t consist of the past values of ε t , it follows from (A4) that N (j) for Γ = (Γ 1 , . . . , Γ k−1 ). We can derive three properties from this equation. First, we post-multiply (A9) by ε t and then exploit N (j) xε , which is the first property. The next one is where the left-hand side is the limit of the sample regression coefficient for ∆X t regressed on β X t−1 + γ j t, Q t and an intercept. This property is demonstrated by substituting ε t + α(β X t−1 + γ j t) + ΓQ t − µ j from (A9) into ∆X t ; we then arrive at the limit result from which (A10) follows by noting that N The left-hand side of (A11) is the limit of the sample product moment of ε t regressed on β X t−1 + γ j t, Q t and an intercept, due to N (j) εβ = 0 and N (j) εq = 0. The right-hand side of (A11) is the limit of the sample product moment of ∆X t regressed on β X t−1 + γ j t, Q t and an intercept, where we have exploited the identity (A10). Now, we return to (A8) and partial out Q t to obtain Furthermore, noting ∑ q j=1 ∆v j = 1 and ∑ q j=1 (∆v j )Ω = Ω, we define for i, k = x, β and l, m = q, x, β. The use of Slutsky's theorem then leads to (A7). It is left to show that Σ xβ = αΣ ββ and Σ xx = αΣ βx + Ω. For the first expression, we apply the identities in (A13) to (A10) so as to obtain (N xβ , N xq ) = (α, Γ) N ββ N βq N qβ N qq .
Recalling the decomposition ∆X t = (∆Y t , ∆Z t ) , we find the following equivalence in the lower-right submatrix of (A7) in Lemma A3: Under the normality assumption for ε t as in HJNR, we could form the conditional variance of the two elements ∆Y t and β X t−1 + tγ j given the element ∆Z t . Moving away from normality under Assumptions 2 and 3, we need to consider instead the limit of a product moment matrix consisting of linear combinations of these elements, defined in the following manner: which appears in (A17) in Lemma A6, and where Let us recall the weak exogeneity condition α z = 0, which implies α = α y 0 and α ⊥ = α y⊥ 0 0 I p−m .
Finally, recall from (4) that the limit variance of innovations in the partial equation equals Ω yy·z = Ω yy − Ω yz Ω −1 zz Ω zy under Assumption A1. Within each sub-sample period, the setup here is identical to that of HJNR. We therefore obtain the following equation, which adapts Equation (10).6 in Johansen (1995, Lemma 10.1).
We now explore the limit of the common trends within each sub-sample period. Define T −1/2 I p −τ ,j 0 1 and X * t = X t−1 t , and the next lemma is a combination of Lemma A.1 in JMN and Lemma 5 in HJNR.
Lemma A5. Suppose that Assumptions 1 and A1 are satisfied. Consider the (p − r + 1)-dimensional process T −1/2 B * T X * int(Tu) on D[0, 1] endowed with the Skorokhod metric with common distortion across the dimensions. Let W u be a (p − r)-dimensional Brownian motion with variance Ω for 0 ≤ u ≤ 1. For υ j−1 ≤ u < υ j and 1 ≤ j ≤ q, the process X * int(Tu) satisfies The convergence holds jointly for 1 ≤ j ≤ q and 0 ≤ u ≤ 1.
Since α ⊥ ΓC = α ⊥ has full row rank and U t is bounded in probability as shown in the proof of Lemma A3, the random walk component α ⊥ ∑ t s=T j−1 +k+1 ε s dominates α ⊥ ΓU t . The initial value τ c,j could be large when j > 1, but it is eliminated when taking differences X * int(Tu) − X * int(Tυ j−1 ) . Thus, the first element of T −1/2 B * T (X * int(Tu) − X * int(Tυ j−1 ) ) converges to α ⊥ (W u − W υ j−1 ) by the Functional Central Limit Theorem (A5) in Assumption A1. The second element also converges as desired since T −1 int(Tu) converges to u. As the number of breaks is finite, the convergence holds jointly for 1 ≤ j ≤ q.
Decompose W u = W 1u , W 2u , in which the dimensions of W 1u and W 2u are m and p − m, respectively. Let us recall the notation X t−1 = (X t−1 , tE t ) , which was used in reduced rank regression in Section 3.1. The next lemma establishes the asymptotic theory for the product moment matrices S ij for i, j = 0, 1 defined in (24).
Lemma A6. Suppose that Assumptions 1 and A1 are satisfied under α z = 0. Define β = (β , γ) , τ = (τ ,1 , . . . , τ ,q ) ∈ R p×q and It then follows that S 00 S 01 β β S 10 β S 11 β Proof of Lemma A6. Recall the decomposition ∆X t = (∆Y t , ∆Z t ) . For (A17), we start with the left-hand side of (A12) and further partial out ∆Z t from the process, to which we then apply the Law of Large Numbers (A4) in Assumption A1. Follow the proof of Lemma A3 afterwards, supplemented with the definition of the limit expression (A16), in order to verify (A17).
For (A19), we note B T S 10 − S 11 β α y = B T S 1ε . The Law of Large Numbers (A4) in Assumption A1 implies B T S 1ε = B T (T − k) −1 ∑ T t=k+1 Z 1,t−1 ε y·z,t + o P (1), in which Z 1,t is the demeaned version of X t as defined in (23). By (A3) and the Granger-Johansen representation in Theorem 1, we can replace B T Z 1,t with the demeaned version of (∑ t s=k+1 ε s α ⊥ , tE t ). The stochastic integral (A6) in Assumption A1 then gives (A19).
Proof of Theorem 2. Follow the proof of Theorem 11.1 in Johansen (1995) by using Lemmas A4 and A6 given above instead of his Lemmas 10.1 and 10.3, and also utilise invariance properties with respect to non-singular linear transformations as in the proof of Theorem 1 in HJNR.
Proof of Theorem 3. The proof presented here is based on Doornik (1998, §9). The asymptotic distribution of the LR test statistic in the partial model in Theorem 2 is rewritten as The process T i for i = 1, . . . , m − r is a function of B i,u and G u , both of which are functions of the (p − r)-dimensional standard Brownian motion B u . Inspection of these functions shows that they are invariant to the relabelling of the coordinates of B u , so that T 1 , . . . , T m−r are identically distributed and any pairs T j , T k are also identically distributed. Hence, In order to relate the moments of the limit distributions of the LR test statistics in the partial and full models, we evaluate the above expressions for m − r in general and for m − r = p − r. For the means of the limit distributions, we find Solving both equations for E (T 1 ) and equating the resulting expressions yield For their variances, we obtain a set of equations similarly, which are solved for Var (T 1 ) to find the desired expression.