1. Introduction
Both simulation and theoretical evidence show that the estimation of a regression with an autoregressive of order one (AR(1)) errors via exact (conditional) maximum likelihood (ML) is inferior to an exact nonlinear least squares (NLS) estimation (the correctly implemented
Prais and Winsten (
1954) (PW) method), at least for trending data.
Park and Mitchell (
1980) show via simulation that, for trending regressors, the PW method is more efficient than the exact/conditional ML (the (
Beach and MacKinnon 1978a) (BM) method). This is theoretically confirmed by (
Magge 1985;
Magee 1989) who approximates biases of various two-step and iterative estimators. Correctly implemented, the PW method delivers exact NLS (equivalent to unconditional ML under normality) estimators. No normality requirement is needed for the PW method.
Gurland (
1954) formalises and resolves criticism against the estimation method of
Cochrane and Orcutt (
1949) (CO, hereafter) for losing the first observation,
1 which is employed by PW or re-proposed by
Kadiyala (
1968).
Koopmans (
1942) was the first to examine the stationary AR(1) correctly. The PW algorithm is a fast zig-zag (computationally inexpensive) algorithm, which retains the first observation and requires no numerical optimiser. However, in order to obtain an (econometrically efficient) exact NLS estimator, the correct (exact) closed form formula must be employed in the iterations for the update of the AR parameter. This is not the case for all available implementations of the PW method (see
Magee (
1989)). For example, (
Judge et al. 1985, chp. 8) propose using the autocorrelation coefficient formula for the update, but this is not correct. Exact/conditional ML fast zig-zag algorithms are provided by (
Beach and MacKinnon 1978a,
1978b) for AR(1)
2 and AR(2)
3 errors, respectively. None is available for higher than two AR orders. The PW algorithm for a regression with AR(1) errors is relatively well known, although usually incorrectly implemented. This is because an incorrect closed form estimator is used for the update.
Park and Mitchell (
1980) implement the PW algorithm correctly, although they do not give all the details. Furthermore, there is no zig-zag PW algorithm for a regression with AR(2) or higher order errors, and this literature gap is filled in this paper. Reliable standard errors are also calculated that are not affected by the presence of the lagged dependent variable as a regressor. In fact, our proposed method corrects the inefficient CO method and poor estimation via bad implementation of the Gauss–Newton algorithm.
The paper is organised as follows:
Section 2 fully discusses the closed form exact NLS estimation of pure AR(1), AR(2), and AR(
p) coefficients, while
Section 3 examines the iterative exact joint estimation of regression and autoregressive coefficients. Finally,
Section 4 concludes.
2. Closed Form Exact NLS: AR(1,2,…,p)
The main ingredient for the correct PW algorithm is the correct
4 closed form update(s) for the autoregressive parameter(s), along with exact generalised least squares (GLS) iterations. Let (observed)
be generated via the stationary AR(1)
with
,
. This assumption can be relaxed, and
can be a martingale difference sequence with finite fourth moment (see (
Stock 1991), for example). Assuming normal
, the exact/conditional ML estimator of (1), conditional on (2), in closed form, is the solution to their cubic equation that BM provides. PW/exact NLS (eventually in closed form) requires the exact sum of squares to be minimised, that is,
see (
Kmenta 1986, p. 319 (full proof in the present paper)). Here, we make use of
.
Phillips (
1977,
1978) and
Sawa (
1978) (among others) employ an inexact version of (3), ignoring
. This results in the so-called LS (also incorrectly called the ML) estimator (
), which is more biased and less efficient than the exact NLS estimator of (3) (also called the (original) PW estimator by
Park and Mitchell (
1980); PW2 in (
Magee 1989, p. 662)). Minimisation is via
which results in the closed form of the (genuine) PW estimator
5
which is better than the OLS estimator
.
6 An even worse estimator is the autocorrelation coefficient
.
7For the stationary AR(2) model
also
,
8 no closed-form exact NLS/PW estimators of the autoregressive parameters are available in the literature. To derive them, the exact sum of squares to be minimised is
Use has been made of the fact that in this case,
. The required canonical equations (with the help of MATHEMATICA™ and imposing
) are
that become (after manipulations)
Re-arranging (10), the system of equations can be solved for the brand new (efficient/genuine) PW estimators of the AR(2) parameters. That is,
In view of (11), we can guess the brand new closed-form PW autoregressive estimators vector of any AR(
p)
along with the required equations for observations 1,…,
p (not stated as they are difficult). That is,
This expression is easily programmable in a matrix programming language.
3. Iteration
The closed-form estimators of the previous section are to be used in iterations for updating. Let observed
be generated via
with (unobserved)
following (1) and (2);
is a
vector of regressors (the
t-th row of regressor matrix
X), and
is the corresponding coefficient vector. The PW iterative algorithm has the following steps: (I) Apply OLS to (12) and derive residual
and
from (5), replacing
with
. (II) Re-estimate
from
For a PW algorithm that delivers exact NLS estimators, the first observation must be employed (see (13)), and the
update must be via (5). According to (
Magee 1989, p. 663), (5) guarantees convergence. (III) Use the new estimate of
and obtain new residual
from (12) and new estimate of
, and proceed to step (II) if convergence criterion is not met. In addition, estimates of
must be forced to be in
.
is the exact GLS matrix of first order with the main diagonal
, first sub-diagonal
, and zero elsewhere. Let
be the converged innovation residual vector from (13) and (14), corresponding to converged
; we define the innovation variance estimate
, and
for the converged estimate of
,
. For
, we rely on the fact that exact NLS estimation is identical to unconditional ML under normality and calculate the quasi-ML covariance
as the inverse of minus the Hessian of the concentrated unconditional loglikelihood
evaluated at
, with
,
and
, and the MLE innovation variance estimate
, with “residual”
for a given
,
. Alternatively, we may use the asymptotic covariance for
, requiring no normality (see below). A third option could be to calculate the sandwich covariance. Note that all these covariances are not affected by the presence of the lagged dependent variable as a regressor.
Similarly, when
in (12) follows the AR(2) model, the genuine PW iterative algorithm has the following steps: (I) Apply OLS to (12) and derive residual
and
and
from (11), replacing
with
. (II) Re-estimate
from
Note that . (In addition, estimates of and must be forced to be inside the stationarity triangle.) (III) Use the new estimator of and obtain new residual from (12) and new estimators of and from (11), and proceed to step (II) if convergence criterion is not met. is the exact GLS matrix of second order with the main diagonal , first sub-diagonal , second subdiagonal , and zero elsewhere. Let be the converged innovation residual from (15)–(17), and define . For the converged estimate of , , we define , and and are the converged estimates of and , respectively. For , under normality, we calculate the quasi-ML covariance from the inverse of minus the Hessian of the concentrated unconditional loglikelihood evaluated at and , with , and , and the MLE innovation variance estimate , with “residual” for a given vector , . Alternatively, we may use the asymptotic covariance for , requiring no normality (see below). A third option could be to calculate the sandwich covariance. Note that all these covariances are not affected by the presence of the lagged dependent variable as a regressor.
Finally, for the model
when
y follows AR(
p) with
, OLS provides
and
, and the first step
, using the generic formula above. Then the OLS (in fact GLS) regression of
on
results in a new
, new
, and new
, repeating until convergence and restricting the elements of
accordingly. The exact GLS matrix
P has the correct Cholesky decomposition
9 of
in positions
and
,
for
,
for
and
with the restriction
, and zeros elsewhere. The fast calculation of
and
may rely on the convolution procedure, except for the first
p rows where it has to be implemented manually. The typical element of
,
, is given in (
Hamilton 1994, p. 125) or (
Galbraith and Galbraith 1974, p. 70). For the converged
,
,
is used, where
is the converged innovation residual and
. Assuming normality, we can calculate the quasi-ML covariance
from the inverse of minus the Hessian of the concentrated unconditional loglikelihood
evaluated at converged
with
,
and
, and the MLE innovation variance estimate
, with “residual”
for a given vector
,
. An alternative covariance for
is the asymptotic covariance
(
is
evaluated at
). A third covariance option is the sandwich covariance. Again, all these covariances are not affected by the presence of the lagged dependent variable as a regressor.
GAUSS™/MATLAB™/GNU Octave programmes for the method are available upon request.