Prais–Winsten Algorithm for Regression with Second or Higher Order Autoregressive Errors

There is no available Prais–Winsten algorithm for regression with AR(2) or higher order errors, and the one with AR(1) errors is not fully justified or is implemented incorrectly (thus being inefficient). This paper addresses both issues, providing an accurate, computationally fast, and inexpensive generic zig-zag algorithm.


Introduction
Both simulation and theoretical evidence show that the estimation of a regression with an autoregressive of order one (AR(1)) errors via exact (conditional) maximum likelihood (ML) is inferior to an exact nonlinear least squares (NLS) estimation (the correctly implemented Prais and Winsten (1954) (PW) method), at least for trending data. Park and Mitchell (1980) show via simulation that, for trending regressors, the PW method is more efficient than the exact/conditional ML (the (Beach and MacKinnon 1978a) (BM) method). This is theoretically confirmed by (Magge 1985;Magee 1989) who approximates biases of various two-step and iterative estimators. Correctly implemented, the PW method delivers exact NLS (equivalent to unconditional ML under normality) estimators. No normality requirement is needed for the PW method. Gurland (1954) formalises and resolves criticism against the estimation method of Cochrane and Orcutt (1949) (CO, hereafter) for losing the first observation, 1 which is employed by PW or re-proposed by Kadiyala (1968). Koopmans (1942) was the first to examine the stationary AR(1) correctly. The PW algorithm is a fast zig-zag (computationally inexpensive) algorithm, which retains the first observation and requires no numerical optimiser. However, in order to obtain an (econometrically efficient) exact NLS estimator, the correct (exact) closed form formula must be employed in the iterations for the update of the AR parameter. This is not the case for all available implementations of the PW method (see Magee (1989)). For example, (Judge et al. 1985, chp. 8) propose using the autocorrelation coefficient formula for the update, but this is not correct. Exact/conditional ML fast zig-zag algorithms are provided by MacKinnon 1978a, 1978b) for AR(1) 2 and AR(2) 3 errors, respectively. None is available for higher than two AR orders. The PW algorithm for a regression with AR(1) errors is relatively well known, although usually incorrectly implemented. This is because an incorrect closed form estimator is used for the update. Park and Mitchell (1980) implement the PW algorithm correctly, although they do not give all the details. Furthermore, there is no zig-zag PW algorithm for a regression with AR(2) or higher order errors, and this literature gap is filled in this paper. Reliable standard errors are also calculated that are not affected by the presence of the lagged dependent variable as a regressor. In fact, our proposed method corrects the inefficient CO method and poor estimation via bad implementation of the Gauss-Newton algorithm.
The paper is organised as follows: Section 2 fully discusses the closed form exact NLS estimation of pure AR(1), AR(2), and AR(p) coefficients, while Section 3 examines the iterative exact joint estimation of regression and autoregressive coefficients. Finally, Section 4 concludes. The main ingredient for the correct PW algorithm is the correct 4 closed form update(s) for the autoregressive parameter(s), along with exact generalised least squares (GLS) iterations. Let (observed) y t be generated via the stationary AR(1)

Closed Form
with This assumption can be relaxed, and v t can be a martingale difference sequence with finite fourth moment (see (Stock 1991), for example). Assuming normal v t , the exact/conditional ML estimator of (1), conditional on (2), in closed form, is the solution to their cubic equation that BM provides. PW/exact NLS (eventually in closed form) requires the exact sum of squares to be minimised, that is, see (Kmenta 1986, p. 319 (full proof in the present paper)). Here, we make use of σ y /σ v = √ 1 − θ 2 . Phillips (1977Phillips ( ,1978 and Sawa (1978) (among others) employ an inexact version of (3), ignoring y 2 1 (1 − θ 2 ). This results in the so-called LS (also incorrectly called the ML) estimator (θ LS ), which is more biased and less efficient than the exact NLS estimator of (3) (also called the (original) PW estimator by Park and Mitchell (1980); PW2 in (Magee 1989, p. 662)). Minimisation is via which results in the closed form of the (genuine) PW estimator 5 which is better than the OLS estimatorθ LS = ∑ n 2 y t y t−1 / ∑ n 2 y 2 t−1 . 6 An even worse estimator is the autocorrelation coefficient ∑ n 2 y t y t−1 / ∑ n 1 y 2 t . 7 For the stationary AR(2) model also ρ 2 = θ 1 ρ 1 + θ 2 , 8 no closed-form exact NLS/PW estimators of the autoregressive parameters are available in the literature. To derive them, the exact sum of squares to be minimised is Use has been made of the fact that in this case, σ y /σ v = 1 − θ 2 2 1 − ρ 2 1 . The required canonical equations (with the help of MATHEMATICA™ and imposing that become (after manipulations) Re-arranging (10), the system of equations can be solved for the brand new (efficient/genuine) PW estimators of the AR(2) parameters. That is, In view of (11), we can guess the brand new closed-form PW autoregressive estimators vector of any AR(p) along with the required equations for observations 1,. . . ,p (not stated as they are difficult). That is,  This expression is easily programmable in a matrix programming language.

Iteration
The closed-form estimators of the previous section are to be used in iterations for updating. Let observed Y t be generated via with (unobserved) y t following (1) and (2); x t is a k × 1 vector of regressors (the t-th row of regressor matrix X), and β is the corresponding coefficient vector. The PW iterative algorithm has the following steps: (I) Apply OLS to (12) and derive residualŷ t andθ PW from (5), replacing y t withŷ t . (II) Re-estimate β from For a PW algorithm that delivers exact NLS estimators, the first observation must be employed (see (13)), and the θ update must be via (5). According to (Magee 1989, p. 663), (5) guarantees convergence. (III) Use the new estimate of β and obtain new residualŷ t from (12) and new estimate of θ, and proceed to step (II) if convergence criterion is not met. In addition, estimates of θ must be forced to be in (−1, 1). P 1 (θ) is the exact GLS matrix of first order with the main diagonal √ 1 − θ 2 , 1, . . . , 1, first sub-diagonal −θ, . . . , −θ, and zero elsewhere. Letṽ be the converged innovation residual vector from (13) and (14), corresponding to convergedθ PW ; we define the innovation variance estimateσ 2 v =ṽ ṽ/(n − k), and covβ =σ 2 v (X P 1 (θ PW ) P 1 (θ PW )X) −1 for the converged estimate of β,β. Forθ PW , we rely on the fact that exact NLS estimation is identical to unconditional ML under normality and calculate the quasi-ML covariance covθ PW as the inverse of minus the Hessian of the concentrated unconditional loglikelihood L(θ) = −(n/2) ln(Y * − X * β GLS,θ ) (Y * − X * β GLS,θ ) evaluated atθ PW , withβ GLS,θ = (X * X * ) −1 (X * Y * ), Y * = P 1 (θ)Y and X * = P 1 (θ)X, and the MLE innovation variance estimateσ 2 vMLE,θ =v(θ) v(θ)/n, with "residual"v(θ) for a given θ,v(θ) = Y − Xβ GLS,θ . Alternatively, we may use the asymptotic covariance forθ PW , requiring no normality (see below). A third option could be to calculate the sandwich covariance. Note that all these covariances are not affected by the presence of the lagged dependent variable as a regressor.
GAUSS™/MATLAB™/GNU Octave programmes for the method are available upon request.

Conclusions
We have proposed the correct, fast, and generic PW algorithm for exact NLS estimation of a regression with AR errors of any order (requiring no normality assumption). However, if the innovations are normal, then our proposed estimators are identical to the unconditional ML estimators. No numerical optimiser is required either, although we have checked the results of our algorithm with the results of numerical optimisation. Quasi-normality may be assumed in constructing a covariance (quasi-ML or sandwich) for the AR parameters. Alternatively, the asymptotic covariance can also be calculated (with no normality requirement). The covariance of the regressor parameters is obtained from the GLS formulae. An information criterion may be employed to select the unknown AR order in practise, approximating any underlying ARMA process. For a recent paper on AR lag order selection, see Butler and Paolella (2017). The estimation method is suitable and efficient for trending data. It is also applicable to dynamic regression with a lagged dependent variable included as a regressor. It provides reliable standard errors in the presence of the lagged dependent variable as a regressor. In addition, it can be used to derive efficient Wald-type unit root tests.
Funding: This research received no external funding.

Conflicts of Interest:
The author declares no conflict of interest.

1
The CO method of Oberhofer and Kmenta (1974) and Magnus (1978) is different. It is an ML estimation method, imposing zero initial conditions. 2 Magnus (1978) provides an alternative cumbersome algorithm which requires an equation solver. 3 There may be convergence issues with the BM algorithm for regression with AR(2) errors, but they are beyond the scope of this paper. 4 The one that obeys the corresponding first order condition for minimisation of the exact NLS sum of squares of the innovations. 5 θy 2 1 + ∑ n 2 y t y t−1 − θ ∑ n 2 y 2 t−1 = 0 ⇒ ∑ n 2 y t y t−1 = θ(−y 2 1 + ∑ n 2 y 2 t−1 ).

6
Sinceθ PW =θ LS (1 + y 2 1 / ∑ n 3 y 2 t−1 ) >θ LS ,θ PW is less biased thanθ LS as the bias of the LS estimator is negative. Magee (1989) gives approximate formulae for biases. We have verified superiority of the PW estimator. 7 Judge et al. (1985) recommend this estimator. However, according to our calculation (based on (Magee 1989)), this is the worst possible estimator amongst the ones examined. 8 ρ 1 and ρ 2 are the theoretical autocorrelation coefficients of order 1 and 2, respectively. 9 For this, one must use the command rev(rev(chol()')') instead of the build in chol() in matrix language programmes.