1. Introduction
The vector error correction model (VECM) of
Engle and Granger (
1987) is one of the most widely used time-series models in empirical practice. The predominant estimation method for the VECM is the reduced-rank regression method introduced by
Johansen (
1988,
1991,
1995). Johansen’s estimation method is widely used because it is straightforward, it is a natural extension of the VAR model of
Sims (
1980), and it is computationally tractable.
Johansen motivated his estimator as the maximum likelihood estimator (MLE) of the VECM under the assumption that the errors are i.i.d. normal. For many users, it is unclear whether the estimator has a broader justification. In contrast, it is well known that least-squares estimation is both maximum likelihood under normality and method of moments under uncorrelatedness.
This paper provides the missing link. It is shown that Johansen’s reduced-rank estimator is algebraically identical to the generalized method of moments (GMM) estimator of the VECM, under the imposition of conditional homoscedasticity. This GMM estimator only uses uncorrelatedness and homoscedasticity. Thus Johansen’s reduced-rank estimator can be motivated under much broader conditions than normality.
The asymptotic efficiency of the estimator in the GMM class relies on the assumption of homoscedasticity (but not normality). When homoscedasticity fails, the reduced-rank estimator loses asymptotic efficiency but retains its interpretation as a GMM estimator.
It is also shown that the GMM tests for reduced (cointegration) rank are nearly identical to Johansen’s likelihood ratio tests. Thus the standard likelihood ratio tests for cointegration can be interpreted more broadly as GMM tests.
This paper does not introduce new estimation nor inference methods. It merely points out that the currently used methods have a broader interpretation than may have been understood. The results leave open the possibility that new GMM methods that do not impose homoscedasticity could be developed.
This connection is not new. In a different context,
Adrian et al. (
2015) derived the equivalence of the likelihood and minimum-distance estimators of the reduced-rank model. The equivalence between the Limited Information Maximum Likelihood (LIML) estimator (which has a dual relation with reduced-rank regression) and a minimum distance estimator was discovered by
Goldberger and Olkin (
1971). Recently,
Kolesár (
2018) drew out connections between likelihood-based and minimum-distance estimation of endogenous linear regression models.
This paper is organized as follows.
Section 2 introduces reduced-rank regression models and Johansen’s estimator.
Section 3 presents the GMM and states the main theorems demonstrating the equivalence of the GMM and MLE.
Section 4 presents the derivation of the GMM estimator.
Section 5 contains two technical results relating generalized eigenvalue problems and the extrema of quadratic forms.
2. Reduced-Rank Regression Models
The VECM for
p variables of cointegrating rank
r with
k lags is
where
are the deterministic components. Observations are
. The matrices
and
are
with
. This is a famous workhorse model in applied time series, largely because of the seminal work of
Engle and Granger (
1987).
The primary estimation method for the VECM is known as reduced-rank regression and was developed by
Johansen (
1988,
1991,
1995). Algebraically, the VECM (
1) is a special case of the reduced-rank regression model:
where
is
,
is
, and
is
. The coefficient matrix
is
and
is
with
. Johansen derived the MLE for model (
2) under the assumption that
is i.i.d.
. This immediately applies to the VECM (
1) and is the primary application of reduced-rank regression in econometrics.
Canonical correlations were introduced by
Hotelling (
1936), and reduced-rank regression was introduced by
Bartlett (
1938). A complete theory was developed by
Anderson and Rubin (
1949,
1950) and
Anderson (
1951). These authors developed the MLE for the model:
where
is
and is unknown. This is an alternative parameterization of (
2) without the covariates
.
Anderson and Rubin (
1949,
1950) considered the case
and primarily focused on estimation of the vector
.
Anderson (
1951) considered the case
.
While the models (
2) and (
3)–(
4) are equivalent and thus have the same MLE, the different parameterizations led the authors to different derivations. Anderson and Rubin derived the estimator of (
3) and (
4) by a tedious application of constrained optimization. (Specifically, they maximized the likelihood of (
3) imposing the constraint (
4) using Lagrange multiplier methods. The solution turned out to be tedious because (
4) is a nonlinear function of the parameters
and
.) The derivation is so cumbersome that it is excluded from nearly all statistics and econometrics textbooks, despite the fact that it is the source of the famous LIML estimator.
The elegant derivation used by
Johansen (
1988) is algebraically unrelated to that of Anderson-Rubin and is based on applying a concentration argument to the product structure in (
2). It is similar to the derivation in
Tso (
1981), although the latter did not include the covariates
. Johansen’s derivation is algebraically straightforward and thus is widely taught to students.
It is useful to briefly describe the likelihood problem. The log-likelihood for model (
2) under the assumption that
is i.i.d.
is
The MLE maximizes
. Johansen’s solution is as follows. Define the projection matrix
and the residual matrices
and
. Consider the generalized eigenvalue problem:
The solutions
satisfy
where
are known as the generalized eigenvalues and eigenvectors of
with respect to
. The normalization
is imposed.
Given the normalization
, Johansen’s reduced-rank estimator for
is
The MLE and are found by least-squares regression of on and .
3. Generalized Method of Moments
Define
. The GMM estimator of the reduced-rank regression model (
2) is derived under the standard orthogonality restriction:
plus the homoscedasticity condition:
where
and
. These moment conditions are implied by the normal regression model. (Equations (
7) and (
8) can be deduced from the first-order conditions for maximization of (
5)). Because (
7) and (
8) can be deduced from (
5) but not vice versa, the moment condition model (
7) and (
8) is considerably more general than the normal regression model (
5).
The efficient GMM criterion (see
Hansen 1982) takes the form
where
and
are the least-squares residuals of the unconstrained model:
The GMM estimator are the parameters that jointly minimize the criterion
subject to the normalization
:
The main contribution of the paper is the following surprising result.
Theorem 1. Theorem 2. , where are the eigenvalues from (6). Theorem 1 states that the GMM estimator is algebraically identical to the Gaussian maximum likelihood estimator.
This shows that Johansen’s reduced-rank regression estimator is not tied to the normality assumption. This is similar to the equivalence of least-squares as a method of moments estimator and the Gaussian MLE in the regression context.
The key is the use of the homoscedastic weight matrix. This shows that the Johansen reduced-rank estimator is an efficient GMM estimator under conditional homoscedasticity. When homoscedasticity fails, the Johansen reduced-rank estimator continues to be a GMM estimator but is no longer the efficient GMM estimator.
It is important to understand that Theorem 1 is different from the trivial statement that the MLE is GMM applied to the first-order condition of the likelihood (e.g.,
Hall (
2005), Section 3.8.1). Specifically, if you take the derivatives of the Gaussian log-likelihood function (
5) and treat these as moment conditions and solve, this is a GMM estimator, and thus MLE can be interpreted as GMM. That is not what Theorem 1 states.
GMM hypothesis tests can be constructed by the difference in the GMM criteria; tests for reduced rank are considered, which in the context of VECM are tests for cointegration rank. The model
is taken and the following hypotheses on reduced rank are considered:
The GMM test statistic for
against
is
The GMM test statistic for
against
is
Theorem 3. The GMM test statistics for reduced rank arewhere are the eigenvalues from (6). Here it is recalled in contrast that the likelihood ratio test statistics derived by Johansen are
The GMM test statistic and the likelihood ratio (LR) statistic yield equivalent tests, as they are monotonic functions of one another. (If the bootstrap is used to assess significance, the two statistics will yield numerically identical p-values.) They are asymptotically identical under standard approximations and in practice will be nearly identical, because the eigenvalues tend to be quite small in value (at least under the null hypothesis), so that . For , the GMM test statistic and the LR statistic do not provide equivalent tests (they cannot be written as monotonic functions of one another), but they are also asymptotically equivalent and will be nearly identical in practice.
An interesting connection noted by a referee is that the statistic
was proposed by
Pillai (
1955) and
Muirhead (
1982, Section 11.2.8).
4. Derivation of the GMM Estimator
It is convenient to rewrite the criterion in standard matrix notation, defining the matrices
Y,
X,
Z, and
W by stacking the observations. Model (
2) is
Using the relation
the following is obtained:
Following the concentration strategy used by Johansen,
is fixed and
and
are concentrated out, producing a concentrated criterion that is a function of
only. The system is linear in the regressors
and
Z. Given the homoscedastic weight matrix, the GMM estimator of
is multivariate least-squares. Using the partialling out (residual regression) approach, the least-squares residual can be written as the residual from the regression of
on
, where
and
are the residuals from regressions on
Z. That is, the least-squares residual is
where the second equality uses the normalization
. Because the space spanned by
equals that spanned by
, the following can be written:
Because
, then
and
where
Using the partialling out (residual regression) approach, the variance estimator (10) can be written as
Thus the concentrated GMM criterion is
The GMM estimator minimizes
or, equivalently, maximizes the third term in (
11). This is a generalized eigenvalue problem. Lemma 2 (in the next section) shows that the solution is
as claimed.
Because the estimates and are found by regression given , and because this is equivalent with the MLE, it is also concluded that and . This completes the proof of Theorem 1.
To establish Theorem 2, Lemma 2 also shows that the minimum of the criterion is
This establishes Theorem 2.