Prepivoted Augmented Dickey-Fuller Test with Bootstrap-Assisted Lag Length Selection

Maitra, Somak; Politis, Dimitris N.

doi:10.3390/stats7040072

Open AccessArticle

Prepivoted Augmented Dickey-Fuller Test with Bootstrap-Assisted Lag Length Selection

by

Somak Maitra

¹ and

Dimitris N. Politis

^2,*

¹

Department of Mathematics, University of California—San Diego, La Jolla, CA 92093, USA

²

Department of Mathematics and Halicioǧlu Data Science Institute, University of California—San Diego, La Jolla, CA 92093, USA

^*

Author to whom correspondence should be addressed.

Stats 2024, 7(4), 1226-1243; https://doi.org/10.3390/stats7040072

Submission received: 25 August 2024 / Revised: 1 October 2024 / Accepted: 9 October 2024 / Published: 17 October 2024

(This article belongs to the Special Issue Modern Time Series Analysis II)

Download

Browse Figures

Versions Notes

Abstract

:

We investigate the application of prepivoting in conjunction with lag length selection to correct the size and power performance of the Augmented Dickey-Fuller test for a unit root. The bootstrap methodology used to perform the prepivoting is a residual based AR bootstrap that ensures that bootstrap replicate time series are created under the null irrespective of whether the originally observed series obeys the null hypothesis or not. Simulation studies wherein we examine the performance of our proposed method are given; we evaluate our method’s performance on ARMA(1,1) models with varying configurations for size and power performance. We also propose a novel data dependent lag selection technique that uses bootstrap data under the null to select an optimal lag length; the performance of our method is compared to existing lag length selection criteria.

Keywords:

ADF test; bootstrap; hypothesis testing; unit root; ARMA models; lag length selection

1. Introduction

Unit root testing is a central problem of time series and econometric analysis. For example, the presence (or not) of a unit root in time series data has huge repercussions in forecasting applications. In econometric analyses, unit roots render the data to be highly sensitive to shocks; while stationary processes tend to revert to the mean in the presence of shocks, random walks tend to diverge.

The primary approaches to unit root testing include the augmented Dickey-Fuller (henceforth referred to as ADF) test [1], and the Phillips-Perron (PP) test [2]. Extensive literature including the work of Schwert (1989) [3], Perron and Ng (1996) [4] as well as Davidson and MacKinnon (2004) [5] point to evidence that PP tests underperform with respect to size in finite samples compared to ADF tests since PP tests are highly reliant on asymptotic results; additionally PP tests suffer from severe size distortions in the presence of negative moving average errors. However, the ADF test may suffer from reduced power; see Paparoditis and Politis (2018) [6] and references therein.

The ADF test examines the presence of a unit root in a stretch of time series data

X_{1}, X_{2}, \dots X_{n}

by focusing on the Ordinary Least Squares (OLS) estimate

\hat{ρ}

of

ρ

in the regression equation

X_{t} = ρ X_{t - 1} + \sum_{i = 1}^{q} a_{j, q} Δ X_{t - i} + e_{t, q}

(1)

fitted to the observed data. In the above equation, we use the notation

Δ X_{t} = X_{t} - X_{t - 1}

. Additionally, note that the number of lagged differences (denoted by q) is allowed to vary with the sample size n, and thus q is an abbreviated notation for

q_{n}

. The null and alternative hypotheses for the ADF testing framework are as follows.

\begin{matrix} H_{0} & : X_{t} = X_{t - 1} + U_{t} \end{matrix}

(2)

\begin{matrix} H_{a} & : X_{t} = U_{t} \end{matrix}

(3)

A typical assumption in the above is that

U_{t} = \sum_{j = 1}^{\infty} a_{j} U_{t - j} + e_{t}

, where the

a_{j}

are autoregressive (AR) coefficients, and

e_{t}

is a sequence of independent, identically distributed (i.i.d.) mean zero random variables with finite (and nonzero) variance, denoted by

σ_{e}^{2}

; however, the i.i.d. assumption on

e_{t}

can be relaxed as discussed in Section 2. To elaborate, the hypotheses are essentially that under the null, the series

X_{t}

is obtained by integrating an infinite order autoregressive process, while the alternative is that the series is a de facto infinite order autoregressive process. The process

U_{t}

can be guaranteed to be stationary and causal by imposing conditions on its coefficients, namely

\sum_{j = 1}^{\infty} {| j |}^{s} | a_{j} | < \infty

for some

s \geq 1

and that

\sum_{j = 1}^{\infty} a_{j} z^{j} \neq 0

for all

| z | \leq 1

.

In their seminal work on the topic, Dickey and Fuller (1979) [1] suggested the studentized statistic

t_{n, q} = \frac{{\hat{ρ}}_{n} - 1}{\hat{Std} ({\hat{ρ}}_{n})}

with

\hat{Std} ({\hat{ρ}}_{n})

used to denote the estimated standard deviation of the OLS estimator of

ρ

. We explicitly denote the importance of the lag length q by using the notation

t_{n, q}

to denote the ADF test statistic as opposed to

t_{n}

. The distribution of this studentized statistic under the null is non-standard/non-normal; however, it is free of unknown parameters, and has been promulgated in econometric and statistics literature. Assumptions under which this distribution is valid have been progressively relaxed since Dickey and Fuller’s original work in 1979; see [6] for a chronological account.

The ADF test has been demonstrated to have less than ideal size and power performance in real world applications. It is well examined in literature [3,7] that the presence of negative MA coefficients in the data generating process causes severe size distortions, while the work of Paparoditis and Politis [6] offers a concrete real world example wherein the power performance is poor. The latter work also provides evidence of the asymptotic collinearity problem in the ADF regression; the collinearity becomes more prominent for large q, in which case loss of power ensues. The issue is then how to work with a moderate value of q while still achieving a (close to) nominal size of the ADF test.

A succinct summary justifying the choice of adopting the prepivoting framework for the ADF test comes from Section 3 of Beran (1988) [8], where the author demonstrates that (under regularity conditions and in the case where the asymptotic null distribution of the test statistic is independent of the unknown parameters) prepivoted tests have smaller order errors in rejection probability than that of the asymptotic theory test. Since the asymptotic null distribution of the OLS estimate

{\hat{ρ}}_{n}

from the ADF regression is independent of

ρ

, the ADF test belongs to the category of hypothesis tests that stand to benefit from the prepivoting framework.

The remainder of this paper is organized as follows. Section 2 reviews the asymptotic properties of the ADF test statistic, along with conditions on

q_{n}

for good size and power. We also demonstrate that the AR-sieve bootstrap of Paparoditis and Politis [9] is a prepivoted test in the sense of Beran [8]. Section 3 highlights the consistency of the prepivoted ADF test against fixed and local alternatives. Section 4 details the results of numerical experiments wherein we study the empirical performance of the prepivoted ADF test for different lag length specifications. Section 5 introduces a novel bootstrap-based tuning parameter selection algorithm and its application to the prepivoted ADF test, along with numerical experiments that demonstrate its efficacy with respect to both size and power.

2. Asymptotic Properties of the ADF Test

The asymptotic properties of the ADF test primarily hinge upon the convergence results of the test statistic

t_{n, q}

under the null and the alternative, as well as the conditions governing the underlying stationary process

U_{t}

. In what follows, we proffer a brief summary of the relaxations to assumptions under which the convergence result of the test statistic

t_{n, q}

under the null. Dickey and Fuller [1] initially derived the non-standard null distribution of the statistic under the assumption that the underlying process is autoregressive with known and finite order. The result was consequently extended in 1984 by Said and Dickey [10] to the setup where the innovation process

U_{t}

is an invertible ARMA process; the latter can be expressed as an AR

(\infty)

process with exponentially decaying coefficients. Further relaxations to conditions have been examined in literature with Chang and Park (2002) [11] considering the case that

{e_{t}}

is a martingale difference sequence and in 2018, Paparoditis and Politis [6] showed the convergence under the sole provision that the innovation process

U_{t}

has a continuous spectral density that is strictly positive; this is can also be stated in the equivalent form that the process

U_{t}

has a Wold-type AR representation with respect to just white noise errors, allowing for a much larger class of time series versus hitherto linear AR

(\infty)

processes with i.i.d. or martingale difference innovations. The following subsections review the two main asymptotic results of the ADF test statistic from the work of Paparoditis and Politis (2018) [6].

2.1. Behaviour of the Test Statistic Under the Null

We now describe the convergence behavior of the test statistic

t_{n, q}

. We adopt the general framework of Paparoditis and Politis [6], and assume that

U_{t}

is a mean zero weakly stationary process with autocovariance function

γ_{U} (h) = Cov (U_{0}, U_{h})

that is absolutely summable; hence, the spectral density

f_{U} (w) = {(2 π)}^{- 1} \sum_{h = - \infty}^{\infty} γ_{U} (h) e^{i w h}

is well-defined. We also assume that the logarithm of the spectral density

f_{U}

is integrable, and therefore

U_{t}

admits the Wold representation

U_{t} = \sum_{j = 1}^{\infty} α_{j} ϵ_{t - j} + ϵ_{t}

where

\sum_{j = 1}^{\infty} α_{j}^{2} < \infty

and

ϵ_{t}

is a white noise, i.e., a weakly stationary process with zero mean, common variance and

Cov (ϵ_{t}, ϵ_{k}) = 0

for

t \neq k

; see Brockwell and Davis [12].

If we further assume that

f_{U} (w) > 0

for all w, then

U_{t}

also admits the Wold-type AR representation:

U_{t} = \sum_{j = 1}^{\infty} b_{j} U_{t - j} + ϵ_{t}

(4)

where

ϵ_{t}

is the same white noise process as above, and the coefficients

b_{j}

are absolutely summable. Additionally,

b (z) = 1 - \sum_{k = 1}^{\infty} b_{j} z^{j} \neq 0

for

| z | \leq 1

; see [6] for details. The assumption of positive spectral density of the underlying innovation process is a sine qua non for an AR(∞) approximation to the spectral density to be consistent; this assumption has been used by Paparoditis and Politis (2018) [6] and Kreiss, Paparoditis and Politis (2011) [13] and has not been found to be a limitation for practical applications and has not been found to be a limitation for practical applications.

Under all the above assumptions, and recalling the ADF regression (1) used to test unit root hypothesis

H_{0}

, the following result is true. To state it, we require an extra assumption on the white noise process appearing in (4), namely:

E (ϵ_{t}^{4}) < \infty and \sum_{n = 1}^{\infty} n ∥ P_{1} (ϵ_{n}) ∥ < \infty

(5)

where

P_{t} (Y) = E [Y | F_{t}] - E [Y | F_{t - 1}]

,

F_{s}

is the

σ

-field generated by

(ϵ_{t}

for

t \leq s)

, and the norm

∥ \cdot ∥

is the

L_{p}

norm with

p = 4

.

Theorem 1

(Paparoditis and Politis (2018) [6]). Assume

f_{U} (w) > 0

for all w, and Equations (4) and (5). If

p_{n} \to \infty

as

n \to \infty

such that

\frac{p_{n}}{\sqrt{n}} \to 0

, then

t_{n, q} \overset{d}{\to} \frac{\int_{0}^{1} W (t) d W (t)}{{(\int_{0}^{1} {(W (t))}^{2} d t)}^{\frac{1}{2}}}

when

H_{0}

is true; here

W (t), t \in [0, 1]

is the standard Wiener process on

[0, 1]

, and

\overset{d}{\to}

denotes convergence in distribution.

The

α

level ADF test therefore rejects the null

H_{0}

whenever the test statistic

t_{n, q}

is smaller than

C_{α}

where

C_{α}

is the lower

α

percentile of the above distribution.

Theorem 1 can be further extended to the cases wherein (1) is modified to include an affine time trend; the same limit result except that the standard Wiener process is replaced by

{\tilde{W}}_{t} = W_{t} + (6 t - 4) \int_{0}^{1} W (s) d s - (12 t - 6) \int_{0}^{1} s W (s) d s

For a case by case analysis of intercept/trend inclusion, the reader is referred to the book of Hamilton [14].

2.2. Behaviour Under the Alternative

The limiting distribution of the test statistic

t_{n, q}

under the alternative is more straightforward; the limiting distribution is normal albeit dependent on the true

ρ

in (1). The work of Paparoditis and Politis [6] establishes this limiting behavior even under their relaxed requirement where the underlying innovation process is just assumed to possess a continuous spectral density which is strictly positive; they also discuss the problem of asymptotic collinearity in the ADF regression where increasing the number of lagged differences leads to a reduction in power. In essence, as the chosen number of lags q increases, the regressors in the ADF regression (1) become asymptotically collinear, leading to slow rate of convergence of the estimator

{\hat{ρ}}_{n}

. This directly leads to a loss of power of the ADF test. We comment on this behaviour of the prepivoted ADF test further in the following sections.

Under the same assumptions on the innovation process

U_{t}

used in Theorem 1 the limiting behaviour of the test statistic under the alternative hypothesis

H_{a}

is as follows.

Theorem 2

(Paparoditis and Politis (2018) [6]). Assume

X_{t} = U_{t}

is true with

U_{t}

such that

f_{U} (w) > 0

for all w, and Equations (4) and (5). Let

q = q_{n} \to \infty

as

n \to \infty

in such a way that

\frac{q_{n}^{4}}{n} \to 0

and

\sqrt{n} \sum_{j = q + 1}^{\infty} | a_{j} | \to 0

. Then, as

n \to \infty

1.: $\frac{n}{q} V a r ({\hat{ρ}}_{n}) \to {(1 - ρ)}^{2}$ in probability
2.: $\sqrt{\frac{n}{q}} ({\hat{ρ}}_{n} - ρ) \overset{d}{\to} N (0, {(1 - ρ)}^{2})$ .

3. Power of the Prepivoted Test

Prepivoting, as introduced by Beran [8], is the mapping of the test statistic

t_{n, q}

to a new test statistic

F^{*} (t_{n, q})

, where

F^{*} (\cdot)

is a consistent estimator of the distribution function of

t_{n, q}

computed from the data

(X_{1}, \dots, X_{n})

; typically,

F^{*} (\cdot)

will be based on some kind of bootstrap procedure. We will adopt the residual AR bootstrap of Paparoditis and Politis [9] to generate the ith bootstrap sample denoted by

(X_{1, i}^{*}, \dots, X_{n, i}^{*})

where

i = 1, \dots, B

. Applying the ADF regression(1) to each of these B samples, we obtain B bootstrap ADF statistics

t_{n, q}^{*}

; this allows us to compute the bootstrap estimate of the CDF, denoted by

F^{*} (\cdot)

. Thus, the prepivoted test statistic is

F^{*} (t_{n, q}) = \frac{1}{B} \sum_{i = 1}^{B} 1_{{t_{n, q}^{*} < t_{n, q}}}

. The prepivoted ADF test rejects the null hypothesis if

F^{*} (t_{n, q}) < α

, the nominal level of the test.

In this section, we briefly discuss the power of the prepivoted ADF test. In particular, we show that it is consistent against the alternative hypothesis

ρ < 1

. We build upon the result of Theorem 2 and obtain the following novel result.

Theorem 3.

If

q = q_{n} \to \infty

as

n \to \infty

such that

\frac{q_{n}^{4}}{n} \to 0

and

\sqrt{n} \sum_{j = q + 1}^{\infty} | a_{j} | \to 0

, then the prepivoted ADF test is consistent where the alternative is that ρ is any fixed value less than 1.

Proof.

From Theorem 2 we have that

\sqrt{\frac{n}{q}} ({\hat{ρ}}_{n} - ρ) \overset{d}{\to} N (0, {(1 - ρ)}^{2})

. Observe that the ADF test statistic can be written as

t_{n, q} = \frac{{\hat{ρ}}_{n} - 1}{\hat{Std} (\hat{ρ})} = \frac{{\hat{ρ}}_{n} - ρ + ρ - 1}{\hat{Std} (\hat{ρ})}

The power of the prepivoted test is given by the probability of correct rejection of the test, i.e.,

P (F^{*} (t_{n, q}) < α | H_{a})

where

F^{*}

is the bootstrap estimate of the CDF of

t_{n, q}

. Let

C_{α}^{*}

be an

α

quantile of

F^{*}

.

Then,

\begin{matrix} P (F^{*} (t_{n, q}) < α | H_{a}) & = P (t_{n, q} < C_{α}^{*} | H_{a}) \\ = P (({\hat{ρ}}_{n} - 1) < C_{α}^{*} \hat{Std} ({\hat{ρ}}_{n}) | H_{a}) \\ = P ({\hat{ρ}}_{n} - ρ < C_{α}^{*} \hat{Std} ({\hat{ρ}}_{n}) + (1 - ρ) | H_{a}) \\ = P (\sqrt{\frac{n}{q}} \frac{{\hat{ρ}}_{n} - ρ}{1 - ρ} < C_{α}^{*} \sqrt{\frac{n}{q}} \frac{\hat{Std} ({\hat{ρ}}_{n})}{1 - ρ} + \sqrt{\frac{n}{q}} | H_{a}) \\ = Φ (C_{α}^{*} \sqrt{\frac{n}{q}} \frac{\hat{Std} ({\hat{ρ}}_{n})}{1 - ρ} + \sqrt{\frac{n}{q}}) \end{matrix}

Now since

\hat{Std} (\hat{ρ}) = \sqrt{\frac{q}{n}} (1 - ρ) + o_{p} (\sqrt{\frac{q}{n}})

and using consistency of bootstrap quantiles (cf. Lemma 1.2.1 of Politis, Romano and Wolf [15]) and the continuous mapping theorem, we have

Φ (C_{α}^{*} \sqrt{\frac{n}{q}} \frac{\hat{Std} ({\hat{ρ}}_{n})}{1 - ρ} + \sqrt{\frac{n}{q}}) \approx Φ (C_{α} + \sqrt{\frac{n}{q}})

(6)

where

C_{α}

is the

α

quantile of the Dickey-Fuller distribution. Observe therefore, that the power of the prepivoted test tends to 1 as

n \to \infty

.

While it is not immediately clear from this expression, the prepivoted ADF test in fact has better power than the asymptotic ADF test. We demonstrate the same through numerical experiments in the next two sections. □

We briefly discuss the behavior of the prepivoted ADF test in the local alternative framework. We build upon the theorem of Aylar, Westerlund and Smeekes [16], where they derive the limiting distribution of the ADF test statistic

t_{n, q}

under the local alternative

ρ = 1 + \frac{c}{n}

for some fixed

c < 0

to demonstrate non-trivial power of the prepivoted ADF test. Their notation and assumptions are briefly reviewed as follows. The DGP under consideration is

X_{t} = ρ X_{t - 1} + U_{t}

with

U_{t} = π (L) ϵ_{t}

. It is assumed that

ϵ_{t}

is a martingale difference sequence with some filtration

F_{t}

with

E (ϵ_{t}^{2}) = σ^{2}

,

\frac{1}{n} \sum_{t = 1}^{n} ϵ_{t}^{2} \to_{p} σ^{2}

and

E (| ϵ_{t}^{4} |) < \infty

. Assume

π (z) \neq 0

for all

| z | \leq 1

, and

\sum_{k = 0}^{\infty} {| k |}^{s} | π_{k} | < \infty

for some

s \geq 1

. They then assume

ρ = 1 + \frac{c}{n}

with

c < 0

, and

\frac{q}{\sqrt{n}} \to 0

as

n \to \infty

. Then, they claim that

t_{n, q} \overset{d}{\to} \frac{\int_{0}^{1} J_{c} (r) d W (r)}{{(\int_{0}^{1} J_{c}^{2} (r) d r)}^{\frac{1}{2}}} + c lim_{q \to \infty} (1 - δ_{q} (1)) π (1) {(\int_{0}^{1} J_{c}^{2} (r) d r)}^{\frac{1}{2}}

(7)

In the above Equation (7),

J_{c} (r) = \int_{v = 0}^{r} exp (c (r - v)) d W (v)

where

W (r)

is the standard Wiener process on

[0, 1]

. Additionally,

δ_{q} (L) = \sum_{k = 1}^{q} θ_{k} L^{k - 1}

, where

θ (L) = π {(L)}^{- 1} = 1 - \sum_{k = 1}^{\infty} θ_{k} L^{k}

. Given this setup, we have the following result.

Theorem 4.

Under the above assumptions, the prepivoted ADF test has non-trivial power against a local alternative of the form

ρ = 1 + \frac{c}{n}

where c is a negative constant.

Proof.

Let

G (\cdot)

be the continuous CDF of the limiting distribution given in (7). Note that

J_{c} (r) \sim N (0, \frac{e^{2 r c} - 1}{2 c})

for

r \in [0, 1]

and is an Ornstein-Uhlenbeck process that solves

d J_{c} (r) = c J_{c} (r) + d W (r)

with

J_{c} (0) = 0

. Let

H_{loc}

denote the local alternative hypothesis. Then,

\begin{matrix} P (F^{*} (t_{n, q}) < α | H_{loc}) & = P (t_{n, q} < C_{α}^{*} | H_{loc}) \\ \approx P (t_{n, q} < C_{α} | H_{loc}) + O_{P} (C_{α}^{*} - C_{α}) \approx G (C_{α}) \neq 0 \end{matrix}

where the last line follows since

G (\cdot)

has infinite support. □

Remark 1.

The above mentioned result of Aylar, Smeekes and Westerlund fails to capture the loss of power with increasing q. The asymptotic collinearity problem is empirically verified in Paparoditis and Politis [6] as well as the later sections of the paper at hand. Nevertheless, the result does allow us to demonstrate that the prepivoted ADF test has non-trivial power against local alternatives—which can be seen in Section 4 and Section 5, in particular Tables 3 and 4.

4. Numerical Experiments

In this section we provide the results of a numerical experiment. The setup is similar to that of Paparoditis and Politis [6], wherein the authors consider the ARMA(1,1) model to be the data generating process i.e.,

X_{t} - ϕ X_{t - 1} = Z_{t} + θ Z_{t - 1}

with

Z_{t} \overset{i . i . d .}{\sim} N (0, 1)

. We consider six different combinations of the ARMA parameters

ϕ

and

θ

; two that yield samples under the unit root hypothesis and four that yield samples under the alternative hypothesis. For each sample size n, we generate 10,000 time series each of length n.

This simulation allowed us to compute the empirical rejection probabilities of the prepivoted ADF test at the nominal level

0.05

. We use a ‘tuning parameter sweep’ approach to select the lag lengths for the experiment in this section: we used the formula

q = n^{a}

with varying values of a. Note that the lag length used was the same for both the ADF regression on the original data as well as the ADF regression on the bootstrap samples.

The bootstrap methodology selected to generate bootstrap samples under the null is from Section 2.2 of Paparoditis and Politis [9]. This is a bootstrap approach that is based on unrestricted residuals, and the resultant bootstrap samples are generated under the null irrespective of whether the original data obey the null or not.

Table 1 and Table 2 correspond to the setting where

{X_{t}}

has a unit root. The resulting empirical rejection probabilities should be close to the nominal level of the test i.e.,

0.05

. The entry closest to

0.05

is presented in bold face. As opposed to [6], the column

a = 0.05

is omitted in all the Tables—the entries in all 6 Tables corresponding to

a = 0.05

were very similar to those of

a = 0.09

.

From the results of Table 1 and Table 2, we observe that the prepivoted ADF test yields accurate size when the lag length is large. The test has good size in the presence of the negative MA parameter

θ = - 0.5

when lags are large, but suffers from severe over-rejection of the null when the lag lengths are short. The test is able to achieve accurate size for all sample sizes.

Table 3, Table 4, Table 5 and Table 6 correspond to the setting where

{X_{t}}

does not have a unit root and is under the alternative hypothesis. These empirical rejection probabilities therefore represent the power of the prepivoted ADF test, and should ideally be as large as possible. The largest entries in each row are represented in boldface.

The results of Table 3, Table 4, Table 5 and Table 6 are encouraging. While the finite sample power of the prepivoted ADF test is fairly low, it is much higher than the power of the asymptotic ADF test for the same specifications. We observe that the power tends to 1 as the sample size increases—empirically verifying the theoretical consistency of the test. Unfortunately, we see that the lag length chosen to optimize size does not also optimize power—the power of the test appears to be highest when the chosen lag length is short. We also see the manifestation of the asymptotic collinearity problem from Paparoditis and Politis [6] when the lag lengths are allowed to increase, we observe that the power of the test reduces. In the next section, we will develop a bootstrap method with the goal of choosing a lag length that optimizes power while securing a size close to nominal.

5. Bootstrap Selection of Tuning Parameters for Hypothesis Tests

In this section, we propose a state of the art tuning parameter selection algorithm for general hypothesis tests, and apply it to the case of lag length selection for the prepivoted ADF test. The general algorithm proceeds as follows. Given time series data

(X_{1}, \dots X_{n})

from an assumed model, and a test statistic

t_{n, q}

where a tuning parameter q is involved, the procedure is as follows:

Bootstrap Algorithm for Tuning Parameter Selection for Hypothesis Tests

Use an information-based choice of q from the sample $(X_{1}, \dots X_{n})$ to obtain initial parameter estimates for the assumed model.
Use the estimated parameters to compute residuals that are not restricted to be under the null.
Bootstrap the residuals to construct B stretches of bootstrap data under the null.
Compute the test statistic $t_{n, q}^{*}$ from each of the B bootstrap samples across the range of acceptable values of the tuning parameter. Collect the acceptance/rejections of these bootstrap tests as a matrix of 0s and 1s.
Select the tuning parameter that has empirical rejection probability closest to the nominal level of the test; this is essentially done by computing the column averages of the matrix generated in the previous step and selecting the index with column average closest to the nominal level. If there is more than one such choice, select the tuning parameter associated with the smallest fitted model.
The above is the optimal tuning parameter called $q^{*}$ .
The optimal tuning parameter is fed back to the original data and the hypothesis test is performed with $q^{*}$ , i.e., computing $t_{n, q^{*}}$ on $(X_{1}, \dots X_{n})$ .
Reject the null hypothesis if the p value of the test based on $t_{n, q^{*}}$ is less than the nominal level.

Occam’s razor postulates that if two models are equally apt in explaining a phenomenon at hand, then the smallest, i.e., least complex, model is preferable. Step 5 of the above algorithm encapsulates our preference for smaller, more parsimonious models. There are several instances in the literature of applying some form of bootstrap with the purpose of choosing a tuning parameter, see Leger and Romano [17], Fenga and Politis [18] and Shao [19]. However, it is important to note that our bootstrap algorithm of Section 5 is novel. As far as we know, no other work has proposed to generate bootstrap samples that obey the null (whether the data obey the null or not) with the purpose of choosing the smallest model order that still yields size close to nominal. In the next subsection, we will apply this algorithm to the ADF test in which case, as we have shown, using a smaller model generally leads to better power.

5.1. ADF with Bootstrap-Assisted Lag Choice

In this subsection, we apply the above general bootstrap algorithm for tuning parameter selection to the prepivoted ADF test. We perform numerical experiments to empirically verify the size and power properties of our algorithm and report the results. As per the work of Paparoditis and Politis [6], we observe that the Akaike Information Criterion (AIC) works reasonably well with regards to both size and power of the ADF test in the case of positive MA parameters. In the case of negative MA parameters, the AIC based tests suffer from over-rejection of the null. The Modified AIC (MAIC) due to Ng and Perron [7] was designed specifically to ameliorate the ADF test’s over-rejection of the null in the presence of negative MA parameters. Their method uses a combination of GLS detrending along with a modified information criterion to select the optimal lag length for the ADF test. The MAIC is designed to favor larger lag lengths in the presence of a negative MA parameter, but as shown by Paparoditis and Politis [6], this can hurt the power performance of the test due to the asymptotic collinearity effect.

Our lag length selection algorithm, dubbed the ‘bootstrap-assisted lag choice’ (BALC) is given below; it uses the residual AR bootstrap of Paparoditis and Politis [9] along with the prepivoting idea of Beran [8] with the aim of selecting lag lengths that yield both good size and power. We performed numerical experiments using an ARMA(1,1) data generating process with six different combinations of the ARMA parameters

ϕ

and

θ

. For each ARMA(1,1) configuration and sample size n, we generated 10,000 different time series and recorded the performance of our novel lag length selection algorithm. This allowed us to compute the empirical rejection probabilities corresponding to our algorithm at the nominal level

0.05

. We compare the results of our novel lag length selection method with prepivoted ADF tests with lag lengths selected by AIC and MAIC optimality. The experimental design is as follows:

Bootstrap Assisted Lag Choice Algorithm for the ADF Test

We are given a stretch of time series data $X_{1}, \dots X_{n}$ to be tested for a possible unit root.
Run the ADF regression (1) on this data with lag length q selected by AIC. AIC minimization is done over the range 1 to $\sqrt{n}$ . Denote the selected lag by $q_{A I C}$ .
Use the estimated parameters and centered residuals from the ADF regression to construct $B_{1}$ bootstrap samples under the null, using the AR bootstrap of Paparoditis and Politis [9].
For each of the bootstrap samples, perform a prepivoted ADF test with varying lag lengths. The bootstrap methodology for the prepivoted test also uses the AR bootstrap of Paparoditis and Politis [9], and generates $B_{2}$ bootstrap samples for each q. The range of values for q we chose was from $⌊ 0.5 \cdot q_{A I C} ⌋$ to $⌈ 1.5 \cdot q_{A I C} ⌉$ . The result of these tests is stored in a 0–1 matrix with $B_{1}$ rows and appropriate number of columns.
The optimal $q^{*}$ is then picked by using the value of q that yields smallest type-1 error for all of the bootstrap samples i.e., the value of q that yields column average closest to the nominal level $0.05$ .
We then perform a prepivoted ADF test with $B_{2}$ bootstrap samples on the original data with number of lags equal to $q^{*}$ .
The unit root null hypothesis is rejected if the p-value is less than the nominal level $0.05$ .

5.2. Discussion of Simulation Results

The Tables below list the empirical rejection probabilities of the prepivoted ADF test with lag lengths selected by three different data dependent approaches. We list the results of our algorithm and compare them with those with lags selected by AIC, BIC and MAIC optimality. AIC optimization is carried out over lag lengths q ranging from 1 to

\sqrt{n}

, while MAIC optimization is carried out over q ranging from 1 to

⌊ 12 \cdot {(\frac{n + 1}{100})}^{0.25} ⌋

as per Schwert [3]. For our experiments, the values of

B_{1}, B_{2}

were set to 250 and 100 respectively in the interest of computational parsimony since our simulation involved 10,000 replications; in practice, given a single dataset in hand, the values of

B_{1}, B_{2}

can easily become of the order of 1000. In terms of computational efficiency, for a single dataset, the computational complexity is entirely determined by the values of

B_{1}

and

B_{2}

and the lag length initially chosen by AIC. More specifically, the ADF regression is performed is

1 + B_{1} B_{2} q_{A I C} + B_{2}

times. Step 4 from the algorithm is highly conducive to parallelization—the search can be run over all candidate values for

q^{*}

at the same time since the results from one candidate value of

q^{*}

are independent of those for other candidate values—thereby improving the computational efficiency and runtime of the algorithm.

Table 7 and Table 8 correspond to the setting wherein the data

{X_{t}}

have a unit root. The entries therefore list the rejection probabilities under the null, and should ideally be as close to the nominal level

0.05

as possible. For each sample size in each Table, the rejection probability closest to the nominal level is presented in bold.

Putting together the results of Table 7 and Table 8, we see that the performance of our method is comparable to that of AIC and MAIC. It is clear from the Tables that tests achieve more accurate size for larger sample sizes. Not surprisingly, the bootstrap selection method also suffers from over-rejection of the null hypothesis in the presence of negative MA parameter.

Table 9, Table 10, Table 11 and Table 12 correspond to the setting wherein the data

{X_{t}}

obey the alternative hypothesis. The entries therefore correspond to the power of the prepivoted ADF test, and should be as high as possible. For each sample size in each Table, the highest rejection probability is presented in bold.

These results allow us to draw the following conclusions. First, it appears that our method performs reasonably well with respect to finite sample power. The power of the BALC test is higher than the power of the prepivoted test with AIC or MAIC lag lengths. We also note that the method is consistent as its power approaches 1 with increasing sample size. Additionally, we observe that Table 10, Table 11 and Table 12 show significant increases in the power of our method compared to the two information based criteria. What is interesting is that the biggest improvements are seen for intermediate sample sizes. In particular, we observe that our method outperforms the MAIC vis-à-vis power. This can primarily be attributed to the fact that our method is designed to favor shorter lags (provided their associated size is close to nominal), thereby avoiding the asymptotic collinearity problem under the alternative hypothesis. It appears that our proposed bootstrap-assisted lag choice (BALC) generally outperforms AIC and MAIC in the cases of size and power, with the exception that the MAIC achieves better size in the well-known problematic case of a negative MA parameter. The prepivoted test with lag length selected by BIC provides a comparable but slightly higher power in the presence of a negative MA parameter as compared to our BALC based test.

All in all, it appears that the BALC method works well at arriving at a compromise between size and power performance. Our lag selection method yields accurate size as well as improved power over the information based lag length selection criteria. The lag length selection idea along with prepivoting appears to yield appreciable improvements over the asymptotic ADF test.

5.3. Comparing the BALC Lag Lengths to MAIC

Given below are histograms comparing the optimal lags selected by our BALC algorithm and lags selected by MAIC optimality. They serve as evidence of the fact that the BALC tends to pick lower lags than MAIC, while still searching in a range higher than the lags selected by AIC—therefore splitting the difference between AIC and MAIC. The lag lengths selected by BALC are generally shorter than those selected by MAIC. Although there is a stochastic component to the BALC framework, we observe that the choice to cut off the search for lags at

1.5 * q_{AIC}

allows us to inflate the search space from just the value selected by AIC while simultaneously preventing the asymptotic collinearity problem. The asymptotic collinearity problem stems from the fact that as the number of lags q increases, the regressors in the ADF regression become asymptotically collinear—this is the reason for the loss in power of the ADF test as pointed out in Paparoditis and Politis (2018) [6]. If a regression is collinear, then the OLS estimator is ill-conditioned/non-unique; fortunately, here we have only approximate/asymptotic collinearity (under the alternative) whose effect can be mitigated by choosing the smallest q possible. This issue is demonstrated in Figure 1, Figure 2, Figure 3 and Figure 4 and best visualized in Figure 5 and Figure 6, where the MAIC picks fairly large lag lengths for a significant proportion of the 10,000 DGPs whereas the BALC algorithm avoids larger lag lengths.

5.4. A Real Data Application

Lastly, we apply our novel lag selection method to a real data example. We apply our BALC testing procedure to the dataset of Figure 7 that is discussed in the textbook on time series analysis by Shumway and Stoffer (2017) [20]. The dataset represents the yearly average global temperature deviations from 1880 to 2009, with the deviations measured in degrees Celsius with respect to the average temperature between 1951 and 1980. In Section 3 of Paparoditis and Politis [6], the authors comment on the failure of the tseries implementation of the ADF test to reject the null, as a consequence of selecting too many lags. As per their discussion and diagnostic figures, there is strong evidence that the detrended data do not contain any strong evidence for a unit root, and therefore obeys the alternative hypothesis—evidenced by the ACF plot of the detrended data given in Figure 8. The BALC method yields an optimal lag length of 3, compared to the tseries ADF function’s recommendation of 5. The prepivoted ADF test with lag length

q = 3

then rejects the unit root null hypothesis, which is the expected result. In contrast, the asymptotic ADF test with

q = q_{A I C}

(which also equals 5) fails to reject the null.

6. Discussion

In this paper, we have presented a novel bootstrap assisted lag length selection criterion for the ADF test. As the asymptotic theory shows, the choice of lag length for the ADF test significantly affects its size and power properties. The convergence rate of the ADF estimator is different under the null and under the alternative. In addition, there is a tug-of-war: we need a large q to ensure size close to nominal while we need a smaller q (still diverging though) to have good power. The practical implementation of the ADF test requires the practitioner to choose the q to use in the ADF test without knowing whether the null or the alternative holds as this is the very question of the test. Given this difficult situation, our paper proposes a bootstrap-based algorithm that helps the practitioner choose a value of q that is the smallest among all q values that achieve size close to nominal. Our Table 7, Table 8, Table 9, Table 10, Table 11 and Table 12 show that our algorithm works well in practice as it automatically adapts to the underlying data structure without the need for the practitioner to choose q in an ad hoc way, e.g., by minimizing a criterion such as AIC, BIC, MAIC, etc.

Author Contributions

Conceptualization, S.M. and D.N.P.; Methodology, S.M. and D.N.P.; Software, S.M. and D.N.P.; Formal analysis, S.M. and D.N.P.; Investigation, S.M. and D.N.P.; Writing—original draft, S.M. and D.N.P.; Visualization, S.M. and D.N.P.; Supervision, D.N.P. All authors have read and agreed to the published version of the manuscript.

Funding

The research of the first author was partially supported by the Richard Libby Graduate Research Award. The research of the second author was partially supported by NSF Grant DMS 24-13718.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Acknowledgments

This research was done using services provided by the OSG Consortium [21,22,23,24] which is supported by the National Science Foundation awards #2030508 and #1836650.

Conflicts of Interest

The authors declare no conflict of interest.

References

Dickey, D.A.; Fuller, W.A. Distribution of the estimators for autoregressive time series with a unit root. J. Am. Stat. Assoc. 1979, 74, 427–431. [Google Scholar]
Phillips, P.C.; Perron, P. Testing for a unit root in time series regression. Biometrika 1988, 75, 335–346. [Google Scholar] [CrossRef]
Schwert, G.W. Tests for unit roots: A Monte Carlo investigation. J. Bus. Econ. Stat. 2002, 20, 5–17. [Google Scholar] [CrossRef]
Perron, P.; Ng, S. Useful modifications to some unit root tests with dependent errors and their local asymptotic properties. Rev. Econ. Stud. 1996, 63, 435–463. [Google Scholar] [CrossRef]
Davidson, R.; MacKinnon, J.G. Econometric Theory and Methods; Oxford University Press: New York, NY, USA, 2004; Volume 5. [Google Scholar]
Paparoditis, E.; Politis, D.N. The asymptotic size and power of the augmented Dickey–Fuller test for a unit root. Econom. Rev. 2018, 37, 955–973. [Google Scholar] [CrossRef]
Ng, S.; Perron, P. Lag length selection and the construction of unit root tests with good size and power. Econometrica 2001, 69, 1519–1554. [Google Scholar] [CrossRef]
Beran, R. Prepivoting test statistics: A bootstrap view of asymptotic refinements. J. Am. Stat. Assoc. 1988, 83, 687–697. [Google Scholar] [CrossRef]
Paparoditis, E.; Politis, D.N. Bootstrapping unit root tests for autoregressive time series. J. Am. Stat. Assoc. 2005, 100, 545–553. [Google Scholar] [CrossRef]
Said, S.E.; Dickey, D.A. Testing for unit roots in autoregressive-moving average models of unknown order. Biometrika 1984, 71, 599–607. [Google Scholar] [CrossRef]
Chang, Y.; Park, J.Y. On the asymptotics of ADF tests for unit roots. Econom. Rev. 2002, 21, 431–447. [Google Scholar] [CrossRef]
Brockwell, P.J.; Davis, R.A. Time Series: Theory and Methods; Springer Science & Business Media: Berlin/Heidelberg, Germany, 1991. [Google Scholar]
Kreiss, J.P.; Paparoditis, E.; Politis, D.N. On the range of validity of the autoregressive sieve bootstrap. Ann. Stat. 2011, 39, 2103–2130. [Google Scholar] [CrossRef]
Hamilton, J.D. Time Series Analysis; Princeton University Press: Princeton, NJ, USA, 2020. [Google Scholar]
Politis, D.N.; Romano, J.P.; Wolf, M. Subsampling; Springer: Berlin/Heidelberg, Germany, 1999. [Google Scholar]
Aylar, E.; Smeekes, S.; Westerlund, J. Lag truncation and the local asymptotic distribution of the ADF test for a unit root. Stat. Pap. 2019, 60, 2109–2118. [Google Scholar] [CrossRef]
Léger, C.; Romano, J.P. Bootstrap choice of tuning parameters. Ann. Inst. Stat. Math. 1990, 42, 709–735. [Google Scholar] [CrossRef]
Fenga, L.; Politis, D.N. LASSO order selection for sparse autoregression: A bootstrap approach. J. Stat. Comput. Simul. 2017, 87, 2668–2688. [Google Scholar] [CrossRef]
Shao, J. Bootstrap model selection. J. Am. Stat. Assoc. 1996, 91, 655–665. [Google Scholar] [CrossRef]
Shumway, R.H.; Stoffer, D.S. Time Series Analysis and Its Applications: With R Examples; Springer: Berlin/Heidelberg, Germany, 2017. [Google Scholar]
Pordes, R.; Petravick, D.; Kramer, B.; Olson, D.; Livny, M.; Roy, A.; Avery, P.; Blackburn, K.; Wenaus, T.; Würthwein, F.; et al. The open science grid. J. Phys. Conf. Ser. 2007, 78, 012057. [Google Scholar] [CrossRef]
Sfiligoi, I.; Bradley, D.C.; Holzman, B.; Mhashilkar, P.; Padhi, S.; Wurthwein, F. The pilot way to grid resources using glideinWMS. In Proceedings of the 2009 WRI World Congress on Computer Science and Information Engineering, Los Angeles, CA, USA, 31 March–2 April 2009; Volume 2, pp. 428–432. [Google Scholar] [CrossRef]
OSG. OSPool; OSG, 2006; Available online: https://osg-htc.org/services/open_science_pool.html (accessed on 8 October 2024). [CrossRef]
OSG. Open Science Data Federation; OSG, 2015; Available online: https://osdf.osg-htc.org/ (accessed on 8 October 2024). [CrossRef]

Figure 1. Comparison of lag lengths selected by BALC and MAIC for

(ρ, θ) = (1, 0.5), n = 200

.

Figure 1. Comparison of lag lengths selected by BALC and MAIC for

(ρ, θ) = (1, 0.5), n = 200

.

Figure 2. Comparison of lag lengths selected by BALC and MAIC for

(ρ, θ) = (1, 0.5), n = 800

.

Figure 2. Comparison of lag lengths selected by BALC and MAIC for

(ρ, θ) = (1, 0.5), n = 800

.

Figure 3. Comparison of lag lengths selected by BALC and MAIC for

(ρ, θ) = (0.985, 0.5), n = 200

.

Figure 3. Comparison of lag lengths selected by BALC and MAIC for

(ρ, θ) = (0.985, 0.5), n = 200

.

Figure 4. Comparison of lag lengths selected by BALC and MAIC for

(ρ, θ) = (0.985, 0.5), n = 800

.

Figure 4. Comparison of lag lengths selected by BALC and MAIC for

(ρ, θ) = (0.985, 0.5), n = 800

.

Figure 5. Comparison of lag lengths selected by BALC and MAIC for

(ρ, θ) = (0.97, - 0.5), n = 200

.

Figure 5. Comparison of lag lengths selected by BALC and MAIC for

(ρ, θ) = (0.97, - 0.5), n = 200

.

Figure 6. Comparison of lag lengths selected by BALC and MAIC for

(ρ, θ) = (0.97, - 0.5), n = 800

.

Figure 6. Comparison of lag lengths selected by BALC and MAIC for

(ρ, θ) = (0.97, - 0.5), n = 800

.

Figure 7. Yearly average global temperature deviations data with superimposed fitted linear trend.

n = 130

.

Figure 7. Yearly average global temperature deviations data with superimposed fitted linear trend.

n = 130

.

Figure 8. Correlogram of detrended dataset.

Table 1. Entries represent empirical rejection probabilities of the prepivoted ADF test with q =

n^{a}