All articles published by MDPI are made immediately available worldwide under an open access license. No special
permission is required to reuse all or part of the article published by MDPI, including figures and tables. For
articles published under an open access Creative Common CC BY license, any part of the article may be reused without
permission provided that the original article is clearly cited. For more information, please refer to
Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature
Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for
future research directions and describes possible research applications.
Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive
positive feedback from the reviewers.
Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world.
Editors select a small number of articles recently published in the journal that they believe will be particularly
interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the
most exciting work published in the various research areas of the journal.
The generalized method of moments (GMM) estimator of the reduced-rank regression model is derived under the assumption of conditional homoscedasticity. It is shown that this GMM estimator is algebraically identical to the maximum likelihood estimator under normality developed by Johansen (1988). This includes the vector error correction model (VECM) of Engle and Granger. It is also shown that GMM tests for reduced rank (cointegration) are algebraically similar to the Gaussian likelihood ratio tests. This shows that normality is not necessary to motivate these estimators and tests.
The vector error correction model (VECM) of Engle and Granger (1987) is one of the most widely used time-series models in empirical practice. The predominant estimation method for the VECM is the reduced-rank regression method introduced by Johansen (1988, 1991, 1995). Johansen’s estimation method is widely used because it is straightforward, it is a natural extension of the VAR model of Sims (1980), and it is computationally tractable.
Johansen motivated his estimator as the maximum likelihood estimator (MLE) of the VECM under the assumption that the errors are i.i.d. normal. For many users, it is unclear whether the estimator has a broader justification. In contrast, it is well known that least-squares estimation is both maximum likelihood under normality and method of moments under uncorrelatedness.
This paper provides the missing link. It is shown that Johansen’s reduced-rank estimator is algebraically identical to the generalized method of moments (GMM) estimator of the VECM, under the imposition of conditional homoscedasticity. This GMM estimator only uses uncorrelatedness and homoscedasticity. Thus Johansen’s reduced-rank estimator can be motivated under much broader conditions than normality.
The asymptotic efficiency of the estimator in the GMM class relies on the assumption of homoscedasticity (but not normality). When homoscedasticity fails, the reduced-rank estimator loses asymptotic efficiency but retains its interpretation as a GMM estimator.
It is also shown that the GMM tests for reduced (cointegration) rank are nearly identical to Johansen’s likelihood ratio tests. Thus the standard likelihood ratio tests for cointegration can be interpreted more broadly as GMM tests.
This paper does not introduce new estimation nor inference methods. It merely points out that the currently used methods have a broader interpretation than may have been understood. The results leave open the possibility that new GMM methods that do not impose homoscedasticity could be developed.
This connection is not new. In a different context, Adrian et al. (2015) derived the equivalence of the likelihood and minimum-distance estimators of the reduced-rank model. The equivalence between the Limited Information Maximum Likelihood (LIML) estimator (which has a dual relation with reduced-rank regression) and a minimum distance estimator was discovered by Goldberger and Olkin (1971). Recently, Kolesár (2018) drew out connections between likelihood-based and minimum-distance estimation of endogenous linear regression models.
This paper is organized as follows. Section 2 introduces reduced-rank regression models and Johansen’s estimator. Section 3 presents the GMM and states the main theorems demonstrating the equivalence of the GMM and MLE. Section 4 presents the derivation of the GMM estimator. Section 5 contains two technical results relating generalized eigenvalue problems and the extrema of quadratic forms.
2. Reduced-Rank Regression Models
The VECM for p variables of cointegrating rank r with k lags is
where are the deterministic components. Observations are . The matrices and are with . This is a famous workhorse model in applied time series, largely because of the seminal work of Engle and Granger (1987).
The primary estimation method for the VECM is known as reduced-rank regression and was developed by Johansen (1988, 1991, 1995). Algebraically, the VECM (1) is a special case of the reduced-rank regression model:
where is , is , and is . The coefficient matrix is and is with . Johansen derived the MLE for model (2) under the assumption that is i.i.d. . This immediately applies to the VECM (1) and is the primary application of reduced-rank regression in econometrics.
where is and is unknown. This is an alternative parameterization of (2) without the covariates . Anderson and Rubin (1949, 1950) considered the case and primarily focused on estimation of the vector . Anderson (1951) considered the case .
While the models (2) and (3)–(4) are equivalent and thus have the same MLE, the different parameterizations led the authors to different derivations. Anderson and Rubin derived the estimator of (3) and (4) by a tedious application of constrained optimization. (Specifically, they maximized the likelihood of (3) imposing the constraint (4) using Lagrange multiplier methods. The solution turned out to be tedious because (4) is a nonlinear function of the parameters and .) The derivation is so cumbersome that it is excluded from nearly all statistics and econometrics textbooks, despite the fact that it is the source of the famous LIML estimator.
The elegant derivation used by Johansen (1988) is algebraically unrelated to that of Anderson-Rubin and is based on applying a concentration argument to the product structure in (2). It is similar to the derivation in Tso (1981), although the latter did not include the covariates . Johansen’s derivation is algebraically straightforward and thus is widely taught to students.
It is useful to briefly describe the likelihood problem. The log-likelihood for model (2) under the assumption that is i.i.d. is
The MLE maximizes . Johansen’s solution is as follows. Define the projection matrix and the residual matrices and . Consider the generalized eigenvalue problem:
The solutions satisfy
where are known as the generalized eigenvalues and eigenvectors of with respect to . The normalization is imposed.
Given the normalization , Johansen’s reduced-rank estimator for is
The MLE and are found by least-squares regression of on and .
3. Generalized Method of Moments
Define . The GMM estimator of the reduced-rank regression model (2) is derived under the standard orthogonality restriction:
plus the homoscedasticity condition:
where and . These moment conditions are implied by the normal regression model. (Equations (7) and (8) can be deduced from the first-order conditions for maximization of (5)). Because (7) and (8) can be deduced from (5) but not vice versa, the moment condition model (7) and (8) is considerably more general than the normal regression model (5).
The efficient GMM criterion (see Hansen1982) takes the form
and are the least-squares residuals of the unconstrained model:
The GMM estimator are the parameters that jointly minimize the criterion subject to the normalization :
The main contribution of the paper is the following surprising result.
Theorem 1 states that the GMM estimator is algebraically identical to the Gaussian maximum likelihood estimator.
This shows that Johansen’s reduced-rank regression estimator is not tied to the normality assumption. This is similar to the equivalence of least-squares as a method of moments estimator and the Gaussian MLE in the regression context.
The key is the use of the homoscedastic weight matrix. This shows that the Johansen reduced-rank estimator is an efficient GMM estimator under conditional homoscedasticity. When homoscedasticity fails, the Johansen reduced-rank estimator continues to be a GMM estimator but is no longer the efficient GMM estimator.
It is important to understand that Theorem 1 is different from the trivial statement that the MLE is GMM applied to the first-order condition of the likelihood (e.g., Hall (2005), Section 3.8.1). Specifically, if you take the derivatives of the Gaussian log-likelihood function (5) and treat these as moment conditions and solve, this is a GMM estimator, and thus MLE can be interpreted as GMM. That is not what Theorem 1 states.
GMM hypothesis tests can be constructed by the difference in the GMM criteria; tests for reduced rank are considered, which in the context of VECM are tests for cointegration rank. The model
is taken and the following hypotheses on reduced rank are considered:
Here it is recalled in contrast that the likelihood ratio test statistics derived by Johansen are
The GMM test statistic and the likelihood ratio (LR) statistic yield equivalent tests, as they are monotonic functions of one another. (If the bootstrap is used to assess significance, the two statistics will yield numerically identical p-values.) They are asymptotically identical under standard approximations and in practice will be nearly identical, because the eigenvalues tend to be quite small in value (at least under the null hypothesis), so that . For , the GMM test statistic and the LR statistic do not provide equivalent tests (they cannot be written as monotonic functions of one another), but they are also asymptotically equivalent and will be nearly identical in practice.
An interesting connection noted by a referee is that the statistic was proposed by Pillai (1955) and Muirhead (1982, Section 11.2.8).
4. Derivation of the GMM Estimator
It is convenient to rewrite the criterion in standard matrix notation, defining the matrices Y, X, Z, and W by stacking the observations. Model (2) is
Following the concentration strategy used by Johansen, is fixed and and are concentrated out, producing a concentrated criterion that is a function of only. The system is linear in the regressors and Z. Given the homoscedastic weight matrix, the GMM estimator of is multivariate least-squares. Using the partialling out (residual regression) approach, the least-squares residual can be written as the residual from the regression of on , where and are the residuals from regressions on Z. That is, the least-squares residual is
where the second equality uses the normalization . Because the space spanned by equals that spanned by , the following can be written:
Because , then
Using the partialling out (residual regression) approach, the variance estimator (10) can be written as
Thus the concentrated GMM criterion is
The GMM estimator minimizes or, equivalently, maximizes the third term in (11). This is a generalized eigenvalue problem. Lemma 2 (in the next section) shows that the solution is as claimed.
Because the estimates and are found by regression given , and because this is equivalent with the MLE, it is also concluded that and . This completes the proof of Theorem 1.
To establish Theorem 2, Lemma 2 also shows that the minimum of the criterion is
This establishes Theorem 2.
5. Extrema of Quadratic Forms
To establish Theorems 1 and 2, a simple extrema property is necessary. First, a simple property that relates the maximization of quadratic forms to generalized eigenvalues and eigenvectors is given. It is a slight extension of Theorem 11.13 of Magnus and Neudecker (1988).
Suppose A and C are real symmetric matrices with . Let be the generalized eigenvalues of A with respect to C and be the associated eigenvectors. Then
Define and . The eigenvalues of are equal to the generalized eigenvalues of A with respect to C. The associated eigenvectors of are . Thus by Theorem 11.13 of Magnus and Neudecker (1988),
as claimed. ☐
Let . If and then
where are the generalized eigenvalues of with respect to , and are the associated eigenvectors.
By Lemma 1,
where are the generalized eigenvalues of with respect to and are the associated eigenvectors. The proof is established by showing that and
Let be a generalized eigenvector/eigenvalue pair of with respect to . The pair satisfies
where and the final equality uses . Substituting into (12) produces
Multiplying both sides by , this implies
By collecting terms,
This is an eigenvalue equation. It shows that is a generalized eigenvalue and is the associated eigenvector of with respect to . Solving, . This means that the generalized eigenvalues of with respect to are and . Because is monotonically increasing on and , it follows that the orderings of and are identical. Thus as claimed. ☐
This research is supported by the National Science Foundation and the Phipps Chair. Thanks to Richard Crump, the co-editors, and two referees for helpful comments on an earlier version. The author gives special thanks to Soren Johansen and Katerina Juselius for many years of stunning research, stimulating conversations, and impeccable scholarship.
Conflicts of Interest
The author declares no conflict of interest.
Adrian, Tobias, Richard K. Crump, and Emanuel Moench. 2015. Regression-based estimation of dynamic asset pricing models. Journal of Financial Economics 118: 211–44. [Google Scholar] [CrossRef]
Anderson, Theodore Wilbur. 1951. Estimating linear restrictions on regression coefficeints for multivariate normal distributions. Annals of Mathematical Statistics 22: 327–50. [Google Scholar] [CrossRef]
Anderson, Theodore Wilbur, and Herman Rubin. 1949. Estimation of the parameters of a single equation in a complete system of stochastic equations. Annals of Mathematical Statistics 20: 46–63. [Google Scholar] [CrossRef]
Anderson, Theodore Wilbur, and Herman Rubin. 1950. The asymptotic properties of estimates of the parameters of a single equation in a complete system of stochastic equations. Annals of Mathematical Statistics 21: 570–82. [Google Scholar] [CrossRef]
Bartlett, Maurice S. 1938. Further aspects of the theory of multiple regression. Proceedings of the Cambridge Philosophical Society 34: 33–40. [Google Scholar] [CrossRef]
Engle, Robert F., and Clive W. J. Granger. 1987. Co-integration and error correction: Representation, estimation, and testing. Econometrica 55: 251–76. [Google Scholar] [CrossRef]
Goldberger, Arthur S., and Ingram Olkin. 1971. A minimum-distance interpretation of limited-information estimation. Econometrica 39: 635–49. [Google Scholar] [CrossRef]
Hall, Alastair R. 2005. Generalized Method of Moments. Oxford: Oxford University Press. [Google Scholar]
Hansen, Lars Peter. 1982. Large sample properties of generalized method of moments estimators. Econometrica 50: 1029–54. [Google Scholar] [CrossRef]
Hotelling, Harold. 1936. Relations between two sets of variates. Biometrika 28: 321–77. [Google Scholar] [CrossRef]
Johansen, Søren. 1988. Statistical analysis of cointegration vectors. Journal of Economic Dynamics and Control 12: 231–54. [Google Scholar] [CrossRef]
Johansen, Søren. 1991. Estimation and hypothesis testing of cointegration vectors in Gaussian vector autoregressive models. Econometrica 59: 1551–80. [Google Scholar] [CrossRef]
Johansen, Søren. 1995. Likelihood-Based Inference in Cointegrated Vector Auto-Regressive Models. Oxford: Oxford University Press. [Google Scholar]
Kolesár, Michal. 2018. Minimum distance approach to inference with many instruments. Journal of Econometrics 204: 86–100. [Google Scholar] [CrossRef]
Magnus, Jan R., and Heinz Neudecker. 1988. Matrix Differential Calculus with Applications in Statistics and Econometrics. New York: Wiley. [Google Scholar]
Muirhead, Robb J. 1982. Aspects of Multivariate Statistical Theory. New York: Wiley. [Google Scholar]
Pillai, K. C. S. 1955. Some new test criteria in multivariate analysis. The Annals of Mathematical Statistics 26: 117–21. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely
those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or
the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas,
methods, instructions or products referred to in the content.