## 1. Introduction

#### Notation

## 2. The I(2) Model

#### 2.1. The I(2) Model with a Linear Trend

#### 2.2. The Triangular Representation

**Theorem**

**1.**

**Proof.**

#### 2.3. Obtaining the Triangular Representation

#### 2.4. Restoring Orthogonality

#### 2.5. Identification in the Triangular Representation

- Orthogonalize to obtain ${A}_{0}^{\prime}{A}_{1}=0,{A}_{0}^{\prime}{A}_{2}=0,{A}_{1}^{\prime}{A}_{2}=0$.
- Choose s full rank rows from ${B}_{1}$, denoted ${M}_{{B}_{1}}$, and set ${B}_{1}\leftarrow {B}_{1}{M}_{{B}_{1}}^{-1}$. Adjust V accordingly.
- Do the same for ${B}_{2}\leftarrow {B}_{2}{M}_{{B}_{2}}^{-1}$.
- Set ${A}_{1}\leftarrow {A}_{1}{V}_{22}$, ${V}_{21}\leftarrow {V}_{22}^{-1}{V}_{21}$ and ${V}_{22}\leftarrow I$.
- ${A}_{2}\leftarrow {A}_{2}{M}_{{A}_{2}}^{-1}$.

## 3. Relation to Other Representations

#### 3.1. $\tau $ Representation

**Corollary**

**1.**

**Proof.**

#### 3.2. $\delta $ Representation

**Corollary**

**2.**

**Proof.**

## 4. Algorithms for Gaussian Maximum Likelihood Estimation

#### 4.1. $\delta $-Switching Algorithm

- To estimate $\tau =[\beta :{\beta}_{1}]$, rewrite (15) as:$${z}_{0t}=\alpha {\beta}^{\prime}{z}_{2t}+{\zeta}_{1}{\beta}^{\prime}{z}_{1t}+{\zeta}_{2}{\beta}_{1}^{\prime}{z}_{1t}+\alpha d{z}_{1t}+{\u03f5}_{t},$$$${z}_{0t}=(\alpha \otimes {z}_{2t}^{\prime}+{\zeta}_{1}\otimes {z}_{1t}^{\prime})\mathrm{vec}\beta +({\zeta}_{2}\otimes {z}_{1t}^{\prime})\mathrm{vec}{\beta}_{1}+\left(\alpha \otimes {z}_{1t}^{\prime}\right)\mathrm{vec}\left({d}^{\prime}\right)+{\u03f5}_{t}.$$We can treat d as a free parameter in (16). First, when $r\ge {s}_{2}^{*}$, $\delta $ has more parameters than ${\tau}_{\perp}$. Second, when $r<{s}_{2}^{*}$, then $\Gamma $ is reduced rank, and ${s}_{2}^{*}-r$ columns of ${\tau}_{\perp}$ are redundant. Orthogonality is recovered in the next step.
- Given $\tau $ and derived ${\tau}_{\perp}$, we can estimate $\alpha $ and $\delta $ by RRR after concentrating out ${\tau}^{\prime}{z}_{1t}$. Introducing $\rho $ with dimension $(r\phantom{\rule{-0.166667em}{0ex}}+\phantom{\rule{-0.166667em}{0ex}}{s}_{2}^{*})\times r$ allows us to write (15) as:$${z}_{0t}={\alpha}^{*}{\rho}^{\prime}\left(\begin{array}{c}{\beta}^{\prime}{z}_{2t}\\ {\tau}_{\perp}^{\prime}{z}_{1t}\end{array}\right)+\zeta {\tau}^{\prime}{z}_{1t}+{\u03f5}_{t}.$$

- 1.
- 2.
- Compute ${f}_{c}^{\left(k\right)}=-log\left|{\Omega}_{c}^{\left(k\right)}\right|.$
- 3.
- Enter a line search for $\tau $.The change in $\tau $ is $\nabla ={\tau}_{c}^{\left(k\right)}-{\tau}^{(k-1)}$ and the line search find a step length $\lambda $ with ${\tau}^{\left(k\right)}={\tau}^{(k-1)}+\lambda \nabla $. Because only $\tau $ is varied, a GLS step is needed to evaluate the log-likelihood for each trial $\tau $. The line search gives new parameters with corresponding ${f}^{\left(k\right)}$.
- T.
- Compute the relative change from the previous iteration:$${c}^{\left(k\right)}=\frac{{f}^{\left(k\right)}-{f}^{(k-1)}}{1+\left|{f}^{(k-1)}\right|}.$$$$|{c}^{\left(k\right)}|\le {\epsilon}_{1}\phantom{\rule{4.pt}{0ex}}\mathrm{and}\phantom{\rule{4.pt}{0ex}}\underset{i,j}{max}\frac{\left|{\Pi}_{ij}^{\left(k\right)}-{\Pi}_{ij}^{(k-1)}\right|}{1+\left|{\Pi}_{ij}^{(k-1)}\right|}\le {\epsilon}_{1}^{1/2}.$$

#### 4.2. MLE with the Triangular Representation

#### 4.2.1. Estimation Steps

- B-step: estimate B, and fix $A,V,W,\Omega $ at ${A}_{c},{V}_{c},{W}_{c},{\Omega}_{c}$. The resulting model is linear in B:$${z}_{0t}={A}_{c}{W}_{c}{B}^{\prime}{z}_{2t}-{A}_{c}{V}_{c}{B}^{\prime}{z}_{1t}+{\epsilon}_{t}.$$$${H}^{\prime}{P}^{-1}{z}_{0t}=L{W}_{c}{B}^{\prime}{z}_{2t}-L{V}_{c}{B}^{\prime}{z}_{1t}+{u}_{t}={\tilde{W}}_{c}{B}^{\prime}{z}_{2t}-{\tilde{V}}_{c}{B}^{\prime}{z}_{1t}+{u}_{t},$$
- V-step: estimate $W,V,$, and fix $A,B,\Omega $. This is a linear model in $(W,V)$, which can be solved by GLS as in the B step.
- A-step: estimate $A,\Omega $ and fix $W,V,B$ at ${W}_{c},{V}_{c},{B}_{c}$:$${z}_{0t}=A\left({W}_{c}{B}_{c}^{\prime}{z}_{2t}-{V}_{c}{B}_{c}^{\prime}{z}_{1t}\right)+{\epsilon}_{t}$$

- 1a.
- B-step: Remove the last ${s}_{2}^{*}-min\{r,{s}_{2}^{*}\}$ columns from B, V and W, as they do not affect the log-likelihood. When iteration is finished, we can add columns of zeros back to W and V and the orthogonal complement of the reduced B to get a rectangular B.
- 3a.
- A-step: we wish to keep A invertible and, so, square during iteration. The missing part of ${A}_{2}$ is filled in with the orthogonal complement of the remainder of A after each regression. This requires re-estimation of ${V}_{.1}$ by OLS.

#### 4.2.2. Triangular-Switching Algorithm

- 1.1
- B-step: obtain ${B}^{\left(k\right)}$ from ${A}^{(k-1)},{V}^{(k-1)},{W}^{(k-1)},{\Omega}^{(k-1)}$.
- 1.2
- V step: obtain ${W}^{\left(k\right)},{V}^{\left(k\right)}$ from ${A}^{(k-1)},{B}^{\left(k\right)},{\Omega}^{(k-1)}$.
- 1.3
- A step: obtain ${A}^{\left(k\right)},{\Omega}^{\left(k\right)}$ from ${B}^{\left(k\right)},{V}^{\left(k\right)},{W}^{\left(k\right)}$.
- 1.4
- ${V}_{.1}$ step: if necessary, obtain new ${V}_{.1}^{\left(k\right)}$ from ${A}^{\left(k\right)},{B}^{\left(k\right)},{V}_{.2}^{\left(k\right)},{V}_{.3}^{\left(k\right)},{W}^{\left(k\right)}$.
- 2...
- As steps 2,3,T from the $\delta $-switching algorithm. In this case, the line search is over all of the parameters in $A,B,V$. ☐

#### 4.3. Linear Restrictions

#### 4.3.1. Delta Switching

#### 4.3.2. Triangular Switching

## 5. Comparing Algorithms

- The $\delta $-switching algorithm, §4.1, which can handle linear restrictions on $\beta $ or $\tau $.
- The triangular-switching algorithm proposed in §4.2.2. This can optionally have linear restrictions on the columns of A or B.
- The improved $\tau $-switching algorithm, Appendix B, implemented to allow for common restrictions on $\tau $.

## 6. A More Detailed Comparison

#### 6.1. Hybrid Estimation

- standard starting values, as well as twenty randomized starting values, then
- triangular switching, followed by
- BFGS optimization (the Broyden-Fletcher, Goldfarb, and Shanno quasi-Newton method) for a maximum of 200 iterations, followed by
- triangular switching.

## 7. Conclusions

## Acknowledgments

## Conflicts of Interest

## Appendix A. Estimation Using the QR Decomposition

#### Reduced Rank Regression

## Appendix B. Tau-Switching Algorithm

- The estimate of $\tau $ is obtained by GLS given all other parameters except $\psi $. Johansen (1997, p. 451) shows the GLS expressions using second moment matrices. Define the orthogonal matrix $\mathrm{A}=({\alpha}_{\perp}:\overline{\overline{\alpha}})$, then using ${\kappa}^{\prime}{\tau}^{\prime}{z}_{1t}=\mathrm{vec}\left({z}_{1t}^{\prime}\tau \kappa \right)=({\kappa}^{\prime}\otimes {z}_{1t})\mathrm{vec}\tau $:$$\begin{array}{cc}\hfill {\mathrm{A}}^{\prime}{z}_{0t}& =\left(\begin{array}{c}{\kappa}^{\prime}\otimes {z}_{1t}\\ {\varrho}^{\prime}\otimes {z}_{2t}\end{array}\right)\mathrm{vec}\tau +\left(\begin{array}{c}0\\ {I}_{r}\otimes {z}_{1t}\end{array}\right)\mathrm{vec}\psi +\left(\begin{array}{c}{\epsilon}_{1t}\\ {\epsilon}_{2t}\end{array}\right)\hfill \\ & =\left\{\left(\begin{array}{c}{\kappa}^{\prime}\\ 0\end{array}\right)\otimes {z}_{1t}+\left(\begin{array}{c}0\\ {\varrho}^{\prime}\end{array}\right)\otimes {z}_{2t}\right\}\mathrm{vec}\tau +\left\{\left(\begin{array}{c}0\\ {I}_{r}\end{array}\right)\otimes {z}_{1t}\right\}\mathrm{vec}\psi +{u}_{t}.\hfill \end{array}$$
- Given just $\tau $, reduced-rank regression of ${z}_{0t}$ corrected for ${\tau}^{\prime}{z}_{1t}$ on ${z}_{0t}$ corrected for ${z}_{1t},{\tau}^{\prime}{z}_{2t}$ is used to estimate $\alpha $. Details are in Johansen (1997, p. 450).
- Given $\tau $ and $\alpha $, the remaining parameters can be obtained by GLS. The equivalence ${\overline{\overline{\alpha}}}^{\prime}={\overline{\alpha}}^{\prime}-{\overline{\alpha}}^{\prime}w{\alpha}_{\perp}^{\prime}$ is used to write the conditional equation as:$${\overline{\alpha}}^{\prime}{z}_{0t}={\gamma}^{\prime}{\alpha}_{\perp}^{\prime}{z}_{0t}+{\varrho}^{\prime}{\tau}^{\prime}{z}_{2t}+{\psi}^{\prime}{z}_{1t}+{\epsilon}_{2t},$$$${\alpha}_{\perp}^{\prime}{z}_{0t}={\kappa}^{\prime}{\tau}^{\prime}{z}_{1t}+{\epsilon}_{1t}.$$

- 1.
- Get ${\tau}_{c}^{\left(k\right)}$ from (A2). Identify this as follows. Select the non-singular $(r+s)\times (r+s)$ submatrix from $\tau $ with the largest volume, say M. We find M by using the first $r+s$ column pivots that are chosen by the QR decomposition of $\tau $ (Golub and Van Loan (2013, Algorithm 5.4.1) ). Set ${\tau}_{c}^{\left(k\right)}\leftarrow {\tau}_{c}^{\left(k\right)}M$. Get ${\alpha}_{c}^{\left(k\right)}$ by RRR; finally, get the remaining parameters from (A3) and (A4).
- 2...
- As steps 2,3,T from the $\delta $-switching algorithm. ☐

## Appendix C. Starting Values

- Set ${\alpha}^{(-1)},{\beta}^{(-1)}$ to their I(1) values (i.e., with full rank $\Gamma $).
- Take two iterations with the relevant switching algorithm subject to restrictions.

- Get ${\alpha}^{(-2)},{\beta}^{(-2)}$ by RRR from the $\tau $-representation using $\kappa =0$:$${z}_{0t}=\alpha ({\beta}^{\prime}{z}_{2t}+{\psi}^{\prime}{z}_{1t})+{\u03f5}_{t}.$$
- Get ${\alpha}^{(-1)},{\beta}^{(-1)}$ by RRR from the $\tau $-representation:$${z}_{0t}-w{\kappa}^{\prime}{\beta}^{\prime}{z}_{1t}=\alpha ({\beta}^{\prime}{z}_{2t}+{\psi}^{\prime}{z}_{1t})+{\u03f5}_{t}.$$
- Take two iterations with the relevant switching algorithm subject to restrictions.

## References

- Anderson, Theodore W. 1951. Estimating linear restrictions on regression coefficients for multivariate normal distributions. Annals of Mathematical Statistics 22: 327–51, (Erratum in Annals of Statistics 8, 1980). [Google Scholar] [CrossRef]
- Anderson, Theodore W. 2002. Reduced rank regression in cointegrated models. Journal of Econometrics 106: 203–16. [Google Scholar] [CrossRef]
- Boswijk, H. Peter. 2010. Mixed Normal Inference on Multicointegration. Econometric Theory 26: 1565–76. [Google Scholar] [CrossRef]
- Boswijk, H. Peter, and Jurgen A. Doornik. 2004. Identifying, Estimating and Testing Restricted Cointegrated Systems: An Overview. Statistica Neerlandica 58: 440–65. [Google Scholar] [CrossRef]
- Cavaliere, Giuseppe, Anders Rahbek, and R.A.M. Taylor. 2012. Bootstrap Determination of the Co-Integration Rank in Vector Autoregressive Models. Econometrica 80: 1721–40. [Google Scholar]
- Dennis, Jonathan G., and Katarina Juselius. 2004. CATS in RATS: Cointegration Analysis of Time Series Version 2. Technical Report. Evanston: Estima. [Google Scholar]
- Doornik, Jurgen A. 2013. Object-Oriented Matrix Programming using Ox, 7th ed. London: Timberlake Consultants Press. [Google Scholar]
- Doornik, Jurgen A. 2017. Accelerated Estimation of Switching Algorithms: The Cointegrated VAR Model and Other Applications. Oxford: Department of Economics, University of Oxford. [Google Scholar]
- Doornik, Jurgen A., and R. J. O’Brien. 2002. Numerically Stable Cointegration Analysis. Computational Statistics & Data Analysis 41: 185–93. [Google Scholar] [CrossRef]
- Golub, Gen H., and Charles F. Van Loan. 2013. Matrix Computations, 4th ed. Baltimore: The Johns Hopkins University Press. [Google Scholar]
- Johansen, Søren. 1988. Statistical Analysis of Cointegration Vectors. Journal of Economic Dynamics and Control 12: 231–54, Reprinted in R. F. Engle, and C. W. J. Granger, eds. 1991. Long-Run Economic Relationships. Oxford: Oxford University Press, pp. 131–52. [Google Scholar] [CrossRef]
- Johansen, Søren. 1991. Estimation and Hypothesis Testing of Cointegration Vectors in Gaussian Vector Autoregressive Models. Econometrica 59: 1551–80. [Google Scholar] [CrossRef]
- Johansen, Søren. 1992. A Representation of Vector Autoregressive Processes Integrated of Order 2. Econometric Theory 8: 188–202. [Google Scholar] [CrossRef]
- Johansen, Søren. 1995a. Likelihood-based Inference in Cointegrated Vector Autoregressive Models. Oxford: Oxford University Press. [Google Scholar]
- Johansen, Søren. 1995b. Identifying Restrictions of Linear Equations with Applications to Simultaneous Equations and Cointegration. Journal of Econometrics 69: 111–32. [Google Scholar] [CrossRef]
- Johansen, Søren. 1995c. A Statistical Analysis of Cointegration for I(2) Variables. Econometric Theory 11: 25–59. [Google Scholar] [CrossRef]
- Johansen, Søren. 1997. Likelihood Analysis of the I(2) Model. Scandinavian Journal of Statistics 24: 433–62. [Google Scholar] [CrossRef]
- Johansen, Søren, and Katarina Juselius. 1990. Maximum Likelihood Estimation and Inference on Cointegration—With Application to the Demand for Money. Oxford Bulletin of Economics and Statistics 52: 169–210. [Google Scholar] [CrossRef]
- Johansen, Søren, and Katarina Juselius. 1992. Testing Structural Hypotheses in a Multivariate Cointegration Analysis of the PPP and the UIP for UK. Journal of Econometrics 53: 211–44. [Google Scholar] [CrossRef]
- Johansen, Søren, and Katarina Juselius. 1994. Identification of the Long-run and the Short-run Structure. An Application to the ISLM Model. Journal of Econometrics 63: 7–36. [Google Scholar] [CrossRef]
- Juselius, Katarina. 2006. The Cointegrated VAR Model: Methodology and Applications. Oxford: Oxford University Press. [Google Scholar]
- Paruolo, Paolo. 2000a. Asymptotic Efficiency of the Two Stage Estimator in I(2) Systems. Econometric Theory 16: 524–50. [Google Scholar] [CrossRef]
- Paruolo, Paolo. 2000b. On likelihood-maximizing algorithms for I(2) VAR models. Mimeo. Varese: Universitá dell’Insubria. [Google Scholar]
- Paruolo, Paolo, and Anders Rahbek. 1999. Weak exogeneity in I(2) VAR Systems. Journal of Econometrics 93: 281–308. [Google Scholar] [CrossRef]

**Figure 1.**Comparison of algorithms: $\delta $-switching (top row) and triangular-switching (bottom row). Simulating a range of $r,s$. Number of iterations on the horizontal axis, count (out of 1000) on the vertical.

**Figure 2.**Comparison of algorithms: $\delta $-switching (left two) and triangular-switching (right two). Simulating a range of $r,s$. Number of iterations on the horizontal axis, count (out of 1000) on the vertical.

**Figure 3.**$\delta $-switching function value minus the triangular switching function value (vertical axis) for each replication (horizontal axis). Both starting from their default starting values. The labels are the cointegration indices $(r,s,{s}_{2})$.

**Figure 4.**$\delta $-switching function value minus the hybrid triangular-switching function value (vertical axis) for each replication (horizontal axis).

**Table 1.**Definitions of the symbols used in the $\tau $ and $\delta $ representations of the I(2) model.

Definition | Dimension | ||
---|---|---|---|

$\tau $ | = | $(\beta :{\beta}_{\perp}\eta )\phantom{\rule{4.pt}{0ex}}\mathrm{when}\phantom{\rule{4.pt}{0ex}}{\varrho}^{\prime}=(I:0)$ | ${p}_{1}\times (r+s)$ |

${\tau}_{\perp}$ | = | ${\beta}_{\perp}{\eta}_{\perp}$ | ${p}_{1}\times {s}_{2}^{*}$ |

$\psi $ | = | $-{({\overline{\overline{\alpha}}}^{\prime}\Gamma )}^{\prime}$ | ${p}_{1}\times r$ |

${\kappa}^{\prime}$ | = | $-{\alpha}_{\perp}^{\prime}\Gamma \overline{\tau}=-({\alpha}_{\perp}^{\prime}\Gamma \overline{\beta}:\xi )={({\kappa}_{1}:{\kappa}_{2})}^{\prime}$ | $(p-r)\times (r+s)$ |

$\delta $ | = | $-{\overline{\alpha}}^{\prime}\Gamma {\tau}_{\perp}$ | $r\times {s}_{2}^{*}$ |

$\zeta $ | = | $-\Gamma \overline{\tau}=({\zeta}_{1}:{\zeta}_{2})$ | $p\times (r+s)$ |

w | = | ${\alpha}_{\perp}-\alpha {\overline{\overline{\alpha}}}^{\prime}{\alpha}_{\perp}=\Omega {\alpha}_{\perp}{\left({\alpha}_{\perp}^{\prime}\Omega {\alpha}_{\perp}\right)}^{-1}=\overline{\overline{{\alpha}_{\perp}}}$ | $p\times (p-r)$ |

d | = | ${\tau}_{\perp}{\delta}^{\prime}$ | ${p}_{1}\times r$ |

e | = | $\tau {\zeta}^{\prime}$ | ${p}_{1}\times p$ |

**Table 2.**Links between symbols used in the representations of the I(2) model, assuming ${W}_{11}={I}_{r}$ and ${a}_{\perp}^{\prime}{a}_{\perp}=I$.

$-\Gamma =\alpha {\psi}^{\prime}+w{\kappa}^{\prime}{\tau}^{\prime}=\alpha \delta {\tau}_{\perp}^{\prime}+\zeta {\tau}^{\prime}=\alpha {d}^{\prime}+{e}^{\prime}$ | |||

$\zeta $ | = | $\alpha {\psi}^{\prime}\overline{\tau}+w{\kappa}^{\prime}$ | $(\phantom{\rule{4.pt}{0ex}}\mathrm{from}\phantom{\rule{4.pt}{0ex}}\Gamma \overline{\tau})$ |

${d}^{\prime}$ | = | ${\psi}^{\prime}{\tau}_{\perp}{\tau}_{\perp}^{\prime}$ | $(\phantom{\rule{4.pt}{0ex}}\mathrm{from}\phantom{\rule{4.pt}{0ex}}\Gamma {\tau}_{\perp})$ |

${\kappa}^{\prime}$ | = | ${\alpha}_{\perp}^{\prime}\zeta $ | $(\phantom{\rule{4.pt}{0ex}}\mathrm{from}\phantom{\rule{4.pt}{0ex}}{\alpha}_{\perp}^{\prime}\Gamma )$ |

${\psi}^{\prime}$ | = | ${d}^{\prime}+{\overline{\overline{\alpha}}}^{\prime}\zeta {\tau}^{\prime}$ | $(\phantom{\rule{4.pt}{0ex}}\mathrm{from}\phantom{\rule{4.pt}{0ex}}{\overline{\overline{\alpha}}}^{\prime}\Gamma )$ |

$\alpha $ | = | ${A}_{0}$ | |

$\beta $ | = | ${B}_{0}$ | |

${d}^{\prime}$ | = | $-{V}_{13}{B}_{2}^{\prime}$ | |

${e}^{\prime}$ | = | $-A({V}_{.1}:{V}_{.2}){({B}_{0}:{B}_{1})}^{\prime}$ | |

$\tau $ | = | $({B}_{0}:{B}_{1})$ |

**Table 3.**Estimation of all I(2) models by $\tau ,\delta $ and triangular switching; all using the same starting value procedure. Number of iterations to convergence for ${\epsilon}_{1}={10}^{-14}$.

r$\backslash {\mathit{s}}_{2}$ | $\mathit{\tau}$ Switching | $\mathit{\delta}$ Switching | Triangular Switching | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|

4 | 3 | 2 | 1 | 4 | 3 | 2 | 1 | 4 | 3 | 2 | 1 | |

1 | 19 | 25 | 36 | 34 | 15 | 24 | 37 | 30 | 31 | 31 | 39 | 32 |

2 | 18 | 32 | 25 | 18 | 32 | 34 | 22 | 27 | 50 | |||

3 | 37 | 23 | 42 | 38 | 50 | 59 | ||||||

4 | 29 | 28 | 85 |

**Table 4.**Estimation of all I(2) models by old versions of $\tau ,\delta $ switching. Number of iterations to convergence for ${\epsilon}_{1}={10}^{-14}$.

r$\setminus {\mathit{s}}_{2}$ | Old $\mathit{\tau}$ Switching | CATS2 Switching | ||||||
---|---|---|---|---|---|---|---|---|

4 | 3 | 2 | 1 | 4 | 3 | 2 | 1 | |

1 | 126 | 198 | 338 | 201 | 5229 | 8329 | 8516 | 5371 |

2 | 79 | 211 | 229 | 7234 | 709 | 861 | ||

3 | 483 | 237 | 550 | 432 | ||||

4 | 4851 | 5771 |

© 2017 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).