Uniform Inference in Panel Autoregression

Chao, John C.; Phillips, Peter C. B.

doi:10.3390/econometrics7040045

Open AccessArticle

Uniform Inference in Panel Autoregression

by

John C. Chao

^1,*

and

Peter C. B. Phillips

^2,3,4,5,*

¹

Department of Economics, University of Maryland, College Park, MD 20742, USA

²

Department of Economics, Yale University, New Haven, CT 06520, USA

³

Department of Economics, University of Auckland, Auckland CBD, Auckland 1010, New Zealand

⁴

Department of Economics, Singapore Management University, 81 Victoria St, Singapore 188065, Singapore

⁵

Department of Economics, University of Southampton, Southampton SO14 0DA, UK

^*

Authors to whom correspondence should be addressed.

Econometrics 2019, 7(4), 45; https://doi.org/10.3390/econometrics7040045

Submission received: 14 October 2019 / Revised: 18 November 2019 / Accepted: 18 November 2019 / Published: 26 November 2019

Download Versions Notes

Abstract

:

This paper considers estimation and inference concerning the autoregressive coefficient (

ρ

) in a panel autoregression for which the degree of persistence in the time dimension is unknown. Our main objective is to construct confidence intervals for

ρ

that are asymptotically valid, having asymptotic coverage probability at least that of the nominal level uniformly over the parameter space. The starting point for our confidence procedure is the estimating equation of the Anderson–Hsiao (AH) IV procedure. It is well known that the AH IV estimation suffers from weak instrumentation when

ρ

is near unity. But it is not so well known that AH IV estimation is still consistent when

ρ = 1

. In fact, the AH estimating equation is very well-centered and is an unbiased estimating equation in the sense of Durbin (1960), a feature that is especially useful in confidence interval construction. We show that a properly normalized statistic based on the AH estimating equation, which we call the

M

statistic, is uniformly convergent and can be inverted to obtain asymptotically valid interval estimates. To further improve the informativeness of our confidence procedure in the unit root and near unit root regions and to alleviate the problem that the AH procedure has greater variation in these regions, we use information from unit root pretesting to select among alternative confidence intervals. Two sequential tests are used to assess how close

ρ

is to unity, and different intervals are applied depending on whether the test results indicate

ρ

to be near or far away from unity. When

ρ

is relatively close to unity, our procedure activates intervals whose width shrinks to zero at a faster rate than that of the confidence interval based on the

M

statistic. Only when both of our unit root tests reject the null hypothesis does our procedure turn to the

M

statistic interval, whose width has the optimal

N^{- 1 / 2} T^{- 1 / 2}

rate of shrinkage when the underlying process is stable. Our asymptotic analysis shows this pretest-based confidence procedure to have coverage probability that is at least the nominal level in large samples uniformly over the parameter space. Simulations confirm that the proposed interval estimation methods perform well in finite samples and are easy to implement in practice. A supplement to the paper provides an extensive set of new results on the asymptotic behavior of panel IV estimators in weak instrument settings.

Keywords:

confidence interval; dynamic panel data models; panel IV; pooled OLS; uniform inference

JEL Classification:

C23; C26

1. Introduction

Due to the many challenges that arise in estimating and conducting statistical inference for dynamic panel data models, a vast literature has emerged studying these models over the past three decades. Much has been learnt about the large sample properties and finite sample performance of various estimation procedures in stable dynamic panel models, not only in univariate but also in multivariate contexts. Important contributions to this literature began with (Nickell 1981; Anderson and Hsiao 1981, 1982), followed by (Arellano and Bond 1991; Ahn and Schmidt 1995; Arellano and Bover 1995; Kiviet 1995; Blundell and Bond 1998; Hahn and Kuersteiner 2002; Alvarez and Arellano 2003), amongst many others. Progress has also been made recently in studying such models when more persistent behavior, such as unit root or near unit root behavior, is present. (Phillips and Moon 1999) provided methods that opened up the rigorous development of asymptotics in such models for both stationary and nonstationary cases and with multidimensional joint and sequential limits. Many subsequent contributions to this nonstationary panel literature have considered more complex regressions, analyzing the effects of incidental trends, serial dependence and cross section dependence; e.g., (Chang 2002, 2004; Phillips and Sul 2003; Moon and Phillips 2004; Moon et al. 2014, 2015; Pesaran 2006; Pesaran and Tosetti 2011).

While this literature has greatly enhanced our understanding of the panel data sampling behavior of point estimators and of associated test statistics, such as the Studentized t statistic or the Wald statistic, what have not been studied are confidence interval procedures which areasymptotically valid in the sense that asymptotic coverage probabilities are at least that of the nominal level uniformly over the parameter space. The development of theoretically justified confidence intervals is especially important in cases where the empirical researcher may not have good prior information about the degree of persistence in the data, since in such situations interval estimates can serve as indispensible supplements to point estimates by providing additional information about sampling uncertainty and about the range of possible values of the autoregressive parameter

ρ

that are consistent with the observed data. Moreover, we know from the unit root time series literature that constructing an asymptotically valid confidence interval for the autoregressive parameter of an

A R (1)

process is a challenging task when the parameter space is taken to be large enough to include both the stable and the unit root cases. This is because the Studentized statistic based on OLS estimation is not uniformly convergent in this case, so that an asymptotically correct confidence interval cannot be constructed by inverting the Studentized statistic in the usual way. To address this problem in the time series literature, (Stock 1999) proposed a confidence procedure based on local-to-unity asymptotics, while simulation and bootstrap type methods have been introduced by (Andrews 1993; Hansen 1999). Recent results by (Mikusheva 2007, 2012) and by (Phillips 2014) have shown that the methods of (Andrews 1993; Hansen 1999), as well as a recentered version of Stock’s method, all give the correct asymptotic coverage probability uniformly over the parameter space. Extending these procedures to the panel data setting does not seem to be straightforward, and panel data versions of these methods are currently unavailable.

To address this need, the present paper proposes simple, asymptotically correct confidence procedures for the autoregressive coefficient of a panel autoregression1. We take as our starting point the estimating equation of the (Anderson and Hsiao 1981) IV procedure. Although much has been written about the Anderson–Hsiao IV estimator having a weak instrument problem when

ρ

is unity or very nearly unity, it should be noted that this estimator is still consistent even when

ρ = 1

and that the weak instrument problem primarily manifests itself in the form of the asymptotic distribution having greater dispersion2. In fact, the Anderson–Hsiao estimating equation is very well-centered and is an unbiased estimating equation in the sense of (Durbin 1960), a property that is of particular importance in constructing asymptotically valid confidence intervals. Exploiting this unbiasedness property, we then show that a properly normalized statistic based on this estimating equation is uniformly convergent over the parameter space

Θ_{ρ} = (- 1, 1]

. This statistic, which we refer to as the

M

statistic since it is based on the (empirical) IV moment function, can be easily and analytically inverted to obtain an asymptotically correct confidence interval. However, because of the weak instrument problem, when the true

ρ

is unity or very near unity, confidence intervals obtained by inverting this

M

statistic may be less informative in the sense that they may be relatively wide in finite samples and, asymptotically, their width shrinks toward zero at the slower rate of

T^{- 1 / 2}

even when both the cross section (N) and the time series (T) sample sizes approach infinity. A similar drawback applies to the GMM procedure of (Han and Phillips 2010), which achieves uniforminference with shrinkage rate

{(N T)}^{- 1 / 2}

over the full domain

Θ_{ρ} .

To obtain more informative interval estimates, we introduce a new confidence procedure which uses information from two different unit root tests, with different power properties, to assess the proximity of the true autoregressive parameter from the exact unit root null hypothesis

H_{0} : ρ = 1

. More precisely, we infer that the true parameter value is unity or very close to unity if the more powerful of the two unit root tests fails to reject

H_{0}

, and we use, in this case, an interval that is localized at

ρ = 1

, with width that shrinks at a faster

N^{- 1 / 2} T^{- 1}

rate3. Second, if the more powerful test rejects

H_{0}

but the less powerful test fails to reject, we use another interval that is still localized at

ρ = 1

but with greater width which shrinks at the rate

N^{- 1 / 2} T^{- 1 / 2}

, a rate that is still faster than that of the width of the confidence interval based on the

M

statistic in the vicinity of

ρ = 1

. Finally, if both tests reject

H_{0}

, then we conclude that the true parameter value is far enough away from unity that we can use the confidence interval based on the

M

statistic, whose width shrinks at the optimal

N^{- 1 / 2} T^{- 1 / 2}

rate in the stable region of the parameter space. We show that the asymptotic size of this pretest based procedure can be uniformly controlled, so that this procedure is asymptotically valid, albeit slightly conservative when the underlying process is stable. The degree of conservatism under our procedure is also controllable and can be kept small by carefully controlling the probability of a Type II error under a local-to-unity parameter sequence. Moreover, in addition to providing informative and asymptotically correct confidence intervals, our procedure has the further advantage that it is given in analytical form and, hence, is computationally simple to implement. Simulations confirm that the proposed method performs well in finite samples.

The remainder of the paper proceeds as follows. Section 2 briefly describes the model, assumptions, and notation. Section 3 introduces two new ways of constructing uniform confidence intervals for the parameter

ρ

. The first is based on inverting the

M

statistic, and the second is the pre-test based confidence interval. Results given in this section show that both confidence procedures are asymptotically valid. Section 4 reports the results of a Monte Carlo study comparing our proposed confidence procedures with some alternative procedures. We provide a brief conclusion in Section 5. Proofs of the main theorems are given in the Appendix A to this paper. Proofs of additional supporting lemmas, as well as additional Monte Carlo results, are reported in an online supplement to this paper (Chao and Phillips 2019). The supplement provides an extensive set of results for panel estimation limit theory in unit root and near unit root cases that help deliver the main results in the paper but are of wider interest regarding asymptotic behavior of panel IV estimators, particularly in weak instrument settings. The supplement also includes additional simulation results concerning the performance of the estimation procedures considered in the paper.

A word on notation. We use ⇒ for convergence in distribution or weak convergence,

\overset{p}{\to}

for convergence in probability,

χ_{ν}^{2}

denotes a chi-square random variable with

ν

degrees of freedom,

Z

denotes the standard normal random variable, and

W_{i} (r)

is the standard Brownian motion on the unit interval

[0, 1]

for each i. For two sequences

\{X_{T}\}

and

\{Y_{T}\}

, we take

X_{T} ≪ Y_{T}

to mean

X_{T} / Y_{T} = o (1)

and

X_{T} \sim Y_{T}

to mean that

X_{T} / Y_{T} = O (1)

and

Y_{T} / X_{T} = O (1)

, as

T \to \infty

. Similarly, for random variables

X_{T}

and

Y_{T}

, we take

X_{T} \overset{p}{\sim} Y_{T}

to mean that

X_{T} / Y_{T} = O_{p} (1)

and

Y_{T} / X_{T} = O_{p} (1)

, as

T \to \infty

. In addition, the notations

Pr (\cdot | ρ)

and

Pr (\cdot | ρ_{T})

denote, respectively, a probability measure indexed by the fixed parameter

ρ

and one indexed by the local-to-unity parameter

ρ_{T}

. Finally, we use wid

(C)

to denote the width of the confidence interval

C

.

2. Model and Assumptions

We work with the following dynamic panel data model written in unobserved components form

\begin{matrix} y_{i t} & = & a_{i} + w_{i t}, \end{matrix}

(1)

\begin{matrix} w_{i t} & = & ρ w_{i t - 1} + ε_{i t}, \end{matrix}

(2)

for

i = 1, \dots, N

and

t = 1, \dots, T

. Here,

\{y_{i t}\}

is the observed data,

\{w_{i t}\}

is generated by a latent AR(1) process,

a_{i}

denotes an (unobserved) individual effect, and

ρ

denotes the panel autoregressive parameter, which is assumed to belong in the parameter space

Θ_{ρ} = (- 1, 1]

. In this paper, we will show that certain properties of our procedure hold uniformly for

ρ \in Θ_{ρ} = (- 1, 1]

. We do this by making use of a result (Lemma 2.6.2) from (Lehmann 1999) which establishes the equivalence of uniform convergence and convergence for every parameter sequence belonging to a given parameter space. For this purpose, it is convenient for us to consider a general class of local-to-unity parameter sequences of the form

ρ = ρ_{T} = exp \{- 1 / q (T)\}

, where

q (T)

is a non-negative function of T such that

q (T) \to \infty

as

T \to \infty

4. Moreover, parameter sequences for stable AR processes can also be written in this general form by considering parameter sequences

\{ρ_{T}\}

which belong to the collection

G_{St} = \{\{ρ_{T}\} : |ρ_{T}| = exp \{- \frac{1}{q (T)}\}, q (T) \geq 0, and q (T) = O (1) as T \to \infty\} .

(3)

Note also that, in the case where such parameter sequences are considered, the AR process given in expression (2) will depend on an indexed parameter

ρ_{T}

, as opposed to a fixed autoregressive parameter, so that, strictly speaking, the observed data and the latent process, in this case, will be strictly triangular indexed arrays

\{y_{i t, T}, w_{i t, T}\}

, which depend additionally on T. However, for notational ease, we shall suppress this additional dependence and simply write

\{y_{i t}, w_{i t}\}

in what follows.

It is sometimes convenient to rewrite the model (1) and (2) in the alternate familiar form as a first-order autoregressive process in

y_{i t}

, viz.,

y_{i t} = a_{i} (1 - ρ) + ρ y_{i t - 1} + ε_{i t} = η_{i} + ρ y_{i t - 1} + ε_{i t},

(4)

where

η_{i} = a_{i} (1 - ρ)

. The following assumptions are made on the model.

Assumption 1 (Errors):

(a)

\{ε_{i t}\} \equiv i . i . d . (0, σ^{2})

across i and t,

σ^{2} > 0

; (b)

E [ε_{i t}^{4}] < \infty

.

Assumption 2 (Random Effects):

(a)

\{a_{i}\} \equiv i . i . d . (μ_{a}, σ_{a}^{2})

across i,

σ_{a}^{2} > 0

; (b)

E [a_{i}^{4}] < \infty

; (c)

ε_{i t}

and

a_{j}

, are mutually independent for all

i, j = 1, 2, \dots, N

and for all

t = 1, 2, \dots, T .

Assumption 3 (Initialization):

Let

y_{i 0} = a_{i} + w_{i 0}

. Suppose that

\{w_{i 0}\}

is independent across i. Suppose also that there exists a positive constant C such that

{sup}_{i} E [w_{i 0}^{2}] \leq C < \infty

, and that

w_{i 0}

and

ε_{j t}

are independent for all

i, j = 1, 2, \dots, N

and for all

t = 1, 2, \dots, T .

Note that Assumption 3 on the initial condition does not impose mean stationarity, i.e., the condition that

E [y_{i 0} | a_{i}] = η_{i} / (1 - ρ) = a_{i}

a . s .

, which in our setup is equivalent to the restriction that

E [w_{i 0} | a_{i}] = 0

a . s .

In addition, observe that Assumption 3 allows for the case where the initial condition is fixed, i.e.,

w_{i 0} = c_{i}

for some sequence of constants

\{c_{i}\}

such that

{sup}_{i} |c_{i}| < \infty

. It is also general enough so that we may specify

w_{i 0}

to be fixed in the unit root case but allow

w_{i 0}

to be a draw from its unconditional distribution with variance

σ^{2} / (1 - ρ^{2})

when the underlying process is stationary.

In lieu of Assumption 2, we also consider in this paper a fixed-effects specification given by the following assumption.

Assumption 2* (Fixed Effects):

Let

\{a_{i}\}

be a nonrandom sequence. Suppose that there exists a positive constant C such that

{sup}_{1 \leq i \leq N} |a_{i}| \leq C < \infty

for all N.

All our theoretical results hold under either the random-effects specification given by Assumption 2 or the fixed-effects specification given by Assumption 2*.

3. Uniform Asymptotic Confidence Intervals

3.1. Confidence Intervals Based on the Anderson–Hsiao IV Procedure

A primary objective of this paper is to develop confidence procedures with asymptotic coverage probability that is at least that of the nominal level uniformly over the parameter space

ρ \in (- 1, 1]

. As a first step, we consider a statistic based on the empirical moment function of the Anderson–Hsiao IV procedure, but properly standardized by an appropriate estimator of the scale parameter. In particular, let

M (ρ) = \frac{1}{\hat{ω} \sqrt{N T}} \sum_{i = 1}^{N} \sum_{t = 3}^{T} y_{i t - 2} (Δ y_{i t} - ρ Δ y_{i t - 1}),

where

{\hat{ω}}^{2} = {\tilde{σ}}^{2} [N^{- 1} T^{- 1} \sum_{i = 1}^{N} \sum_{t = 4}^{T} {(y_{i t - 3} - y_{i t - 2})}^{2} + N^{- 1} T^{- 1} \sum_{i = 1}^{N} y_{i T - 2}^{2}]

and

{\tilde{σ}}^{2} = N^{- 1} T_{1}^{- 1} \sum_{i = 1}^{N} \sum_{t = 2}^{T} (y_{i t} - {\bar{y}}_{i} - \tilde{ρ} [y_{i t - 1} - {\bar{y}}_{i, - 1}])

and where

{\bar{y}}_{i} = T_{1}^{- 1} \sum_{t = 2}^{T} y_{i t}

and

{\bar{y}}_{i, - 1} = T_{1}^{- 1} \sum_{t = 2}^{T} y_{i t - 1}

with

T_{1} = T - 1

. In addition, we let

\tilde{ρ}

denote any preliminary estimator of

ρ

that satisfies the following conditions

Assumption 4: Let

\tilde{ρ}

be an estimator of

ρ

. Suppose that the following conditions hold for this estimator as

N, T \to \infty

such that

N^{κ} / T

= τ

, for

κ \in (0, \infty)

and

τ \in (0, \infty)

.

(a): $\tilde{ρ} - ρ_{T} = o_{p} (T^{- 1 / 2})$ , if $ρ_{T} = 1$ for all T sufficient large or if $ρ_{T} = exp \{- 1 / q (T)\}$ such that $T / q (T) = O (1)$ ;
(b): $\tilde{ρ} - ρ_{T} = o_{p} (q {(T)}^{- 1 / 2})$ , if $ρ_{T} = exp \{- 1 / q (T)\}$ such that $q (T) \to \infty$ but $q (T) / T \to 0$ ;
(c): $\tilde{ρ} - ρ_{T} = o_{p} (1)$ if $\{ρ_{T}\} \in G_{St}$ , where $G_{St}$ is given in expression (3) above.

The asymptotic properties of

M (ρ)

under different parameter sequences

\{ρ_{T}\}

are given by the following result.

Theorem 1.

Let Assumptions 1, 3, 4, and either 2 or 2* hold. The following statements hold as

N, T \to \infty

such that

N^{κ} / T

= τ

, for

κ \in (0, \infty)

and

τ \in (0, \infty)

.

(a): Suppose that $\{ρ_{T}\} \in G_{1}^{M}$ , where $G_{1}^{M}$ $= \{\{ρ_{T}\} : ρ_{T} = 1 f o r a l l T s u f f i c i e n t l y l a r g e\}$ . Then,

$\begin{matrix} M (ρ_{T}) & = & - \frac{1}{σ^{2} \sqrt{2} \sqrt{N T}} \sum_{i = 1}^{N} \sum_{t = 4}^{T} ε_{i t - 2} ε_{i t - 1} + \frac{1}{σ^{2} \sqrt{2} \sqrt{N T}} \sum_{i = 1}^{N} w_{i T - 2} ε_{i T} + o_{p} (1) \\ \Rightarrow & N (0, 1) . \end{matrix}$
(b): Suppose that $\{ρ_{T}\} \in G_{2}^{M}$ , where $G_{2}^{M} = \{\{ρ_{T}\} : ρ_{T} = exp \{- 1 / q (T)\} a n d T / q (T) \to 0\}$ . Then,

$\begin{matrix} M (ρ_{T}) & = & - \frac{1}{σ^{2} \sqrt{2} \sqrt{N T}} \sum_{i = 1}^{N} \sum_{t = 4}^{T} ε_{i t - 2} ε_{i t - 1} + \frac{1}{σ^{2} \sqrt{2} \sqrt{N T}} \sum_{i = 1}^{N} w_{i T - 2} ε_{i T} + o_{p} (1) \\ \Rightarrow & N (0, 1) . \end{matrix}$
(c): Suppose that $\{ρ_{T}\} \in G_{3}^{M}$ , where $G_{3}^{M} = \{\{ρ_{T}\} : ρ_{T} = exp \{- 1 / q (T)\} a n d q (T) \sim T\}$ . Then,

$\begin{matrix} M (ρ_{T}) & = & - \frac{1}{ω_{T} \sqrt{N T}} \sum_{i = 1}^{N} \sum_{t = 4}^{T} ε_{i t - 2} ε_{i t - 1} + \frac{1}{ω_{T} \sqrt{N T}} \sum_{i = 1}^{N} w_{i T - 2} ε_{i T} + o_{p} (1) \\ \Rightarrow & N (0, 1), \end{matrix}$

where $ω_{T} = σ^{2} \sqrt{1 + \frac{q (T)}{2 T} [1 - exp \{- \frac{2 T}{q (T)}\}]}$ .
(d): Suppose that $\{ρ_{T}\} \in G_{4}^{M}$ , where $G_{4}^{M} = \{\{ρ_{T}\} : ρ_{T} = exp \{- 1 / q (T)\} a n d q (T) \to \infty s u c h t h a t q (T) / T \to 0\}$ . Then,

$M (ρ_{T}) = - \frac{1}{σ^{2} \sqrt{N T}} \sum_{i = 1}^{N} \sum_{t = 4}^{T} ε_{i t - 2} ε_{i t - 1} + o_{p} (1) \Rightarrow N (0, 1) .$
(e): Suppose that $\{ρ_{T}\} \in G_{5}^{M}$ , where $G_{5}^{M} = \{\{ρ_{T}\} : |ρ_{T}| = exp \{- 1 / q (T)\}, q (T) \geq 0, a n d q (T) = O (1) as T \to \infty\}$ . Then,

$\begin{matrix} M (ρ_{T}) & = & - \sqrt{\frac{1 + ρ_{T}}{2 σ^{4}}} \frac{1}{\sqrt{N T}} \sum_{i = 1}^{N} \sum_{t = 4}^{T} ε_{i t - 2} ε_{i t - 1} \\ + \sqrt{\frac{1 + ρ_{T}}{2 σ^{4}}} \frac{(1 - ρ_{T})}{\sqrt{N T}} \sum_{i = 1}^{N} \sum_{t = 4}^{T} w_{i t - 3} ε_{i t - 1} + o_{p} (1) \\ \Rightarrow & N (0, 1) . \end{matrix}$

To provide some intuition about the

M (ρ)

statistic and about the conditions placed on the preliminary estimator

\tilde{ρ}

(i.e., Assumption 4), we set

M^{*} (ρ) = \hat{ω} M (ρ)

to be the unstandardized version of

M (ρ) .

From the proof of Theorem 1, given in Appendix A, it is evident that

M^{*} (ρ)

can be decomposed into several terms whose orders of magnitude change depending on how close the parameter sequence

\{ρ_{T}\}

is to unity. In consequence, the lead term of

M^{*} (ρ)

is not the same in the stable (panel) autoregression case as it is in the case where

ρ_{T}

is very close to unity. On the other hand, when appropriately normalized, this statistic will converge to a standard normal distribution in each case, but this requires a scale estimator that will adapt to variation in the normalization factor under alternative parameter sequences. The estimator

\hat{ω} = {\tilde{σ}}^{2} [N^{- 1} T^{- 1} \sum_{i = 1}^{N} \sum_{t = 4}^{T} {(y_{i t - 3} - y_{i t - 2})}^{2} + N^{- 1} T^{- 1} \sum_{i = 1}^{N} y_{i T - 2}^{2}]

turns out to have these adaptive properties, as shown in Lemma SC-13 and its proof (given in the supplement). An important component in the construction of a proper normalization factor is to have a preliminary estimator

\tilde{ρ}

with a fast enough rate of convergence, so that the resulting estimator of

σ^{2}

is consistent under every possible parameter sequence

\{ρ_{T}\}

in the parameter space

Θ_{ρ} = (- 1, 1]

. Examination of the proof of Lemma SC-12 reveals that the conditions needed on

\tilde{ρ}

are precisely those given in Assumption 45.

It should be noted that the Anderson–Hsiao IV estimator, which we will denote by

{\hat{ρ}}_{IVD}

in this paper, does not satisfy the conditions of Assumption 46. This is because, as shown in Theorem SA-1 of the supplement to this paper,

{\hat{ρ}}_{IVD} - ρ_{T} \overset{p}{\sim} T^{- 1 / 2}

when

ρ_{T} = 1

for all T sufficient large or when

ρ_{T} = exp \{- 1 / q (T)\}

such that

T / q (T) = O (1)

, so that its rate of convergence is not fast enough in the unit root and near unit root regions of the parameter space. Furthermore, the pooled OLS (POLS) estimator, which we will denote by

{\hat{ρ}}_{pols}

, also does not satisfy these conditions since it is inconsistent in the stationary region as shown in Theorem SA-2 of the supplement. Hence, in Appendix SA of the supplement to our paper, we introduce a new point estimator,

{\hat{ρ}}_{AIP}

, which is an average of

{\hat{ρ}}_{IVD}

and

{\hat{ρ}}_{pols}

where the average is taken using a data-dependent weight function that, in turn, depends on a unit root statistic. This estimator turns out to satisfy the conditions of Assumption 4 because it exploits the differential strengths of

{\hat{ρ}}_{IVD}

and

{\hat{ρ}}_{pols}

in different parts of the parameter space and can place more or less weight on one or the other of these two estimators, depending on the information provided by a preliminary unit root test on the true value of

ρ

. We use

{\hat{ρ}}_{AIP}

in constructing the scale estimator

\hat{ω}

for the Monte Carlo results reported Section 4 of this paper, but we note that

{\hat{ρ}}_{AIP}

is not the only estimator which satisfies Assumption 4, as both the within-group OLS estimator and the bias-corrected within-group estimator proposed by (Hahn and Kuersteiner 2002) could also be used to obtain a scale estimator

\hat{ω}

with the desired properties, although

{\hat{ρ}}_{AIP}

does have a faster rate of convergence than the uncorrected within-group estimator in the unit root and near unit root cases. Because the focus of this paper is on confidence procedures, and not on point estimation, we will not give technical details of

{\hat{ρ}}_{AIP}

in the body of this paper, but will instead refer interested readers to Appendix SA of the supplement for more details, as well as for formal results, on the rate of convergence of

{\hat{ρ}}_{AIP}

under alternative parameter sequences.

The following theorem shows the uniform convergence of the statistic

M (ρ)

over the parameter space

Θ_{ρ} = (- 1, 1]

.

Theorem 2.

Let

Φ (x)

denote the cdf of a standard normal random variable. Suppose that Assumptions 1, 3, 4, and either 2 or 2* hold. Then, for each

x \in R

,

sup_{ρ \in (- 1, 1]} |Pr (M (ρ) \leq x | ρ) - Φ (x)| \to 0,

as

N, T \to \infty

such that

N^{κ} / T = τ

for constants

κ \in (0, \infty)

and

τ \in (0, \infty)

7.

Remark 1.

(i): Let $z_{α}$ denote the $1 - α$ quantile of the standard normal distribution. A level $1 - α$ confidence interval based on the statistic $M (ρ)$ can be taken to be

$C_{α}^{M} = \{ρ \in (- 1, 1] : - z_{α / 2} \leq M (ρ) \leq z_{α / 2}\}$

(5)

It is immediate from Theorem 2 that the confidence procedure defined by (5) is asymptotically valid in the sense that its coverage probability is equal to the nominal level $1 - α$ in large samples, uniformly over $ρ \in (- 1, 1]$ .
(ii): The uniform limit result given in Theorem 2 above is established under a pathwise asymptotic scheme where we take $N, T \to \infty$ such that $N^{κ} / T = τ$ for constants κ $\in (0, \infty)$ and $τ \in (0, \infty)$ . Note that the asymptotic framework employed here does not restrict N and T to follow a specific diagonal expansion path, but rather allows for a whole range of possible paths indexed by κ $\in (0, \infty)$ ; and, hence, our results do not require the kind of restrictions on the relative magnitudes of N and T that are often imposed in other asymptotic analysis of panel data models. Indeed, by allowing T to grow as any positive (real-valued) power of N, our framework can accommodate a wide variety of settings where T may be of smaller, larger, or similar order of magnitude as N.
(iii): As noted earlier and as is evident from the proof of Theorem 2 given in the Appendix A, uniform convergence here is established by showing convergence to the same distribution under every parameter sequence in the parameter space. To the best of our knowledge, the use of this approach in statistics originated with the book on large sample theory by (Lehmann 1999). Important extensions, as well as applications, of this approach to a variety of econometric models and inferential procedures have also been made more recently in the papers by (Andrews and Guggenberger 2009; Andrews et al. 2011).
(iv): As we noted earlier in the Introduction, a primary reason why the $M$ statistic is well-behaved is that the (empirical) IV moment function is well-centered as an unbiased estimating equation. In this sense, our approach relates to early work by (Durbin 1960) on unbiased estimatingequations which was applied to time series $A R (1)$ regression in his original study. Importantly, in dynamic panel data models with individual effects, estimating equations associated with least squares procedures tend not to be as well-centered as the IV estimating equations explaining the need for IV in this context (c.f., Han and Phillips 2010).
(v): A drawback of $C_{α}^{M}$ is that the rate at which the width of this confidence interval shrinks toward zero as sample sizes grow is relatively slow for parameter sequences that are very close to unity. As also noted in the Introduction, this is due to the well-known ‘weak instrument’ problem which induces a slow rate of convergence for the Anderson–Hsiao IV procedure in this case. More precisely, using the results given in Lemmas SA-1, SC-1, and SC-13 in the supplement to this paper, we can easily show that wid $(C_{α}^{M}) = O_{p} (T^{- 1 / 2})$ when $ρ_{T} = exp \{- 1 / q (T)\}$ such that $\sqrt{N} T / q (T) = O (1)$ , so that the rate of shrinkage here does not even depend on N, even as both N and T go to infinity (see also Phillips 2018). This slower rate of convergence is also reflected in the Monte Carlo results reported in Section 4 below, as the results there show that the average interval width of $C_{α}^{M}$ can be a very substantial fraction of the width of the entire parameter space when $ρ = 1$ . To improve on the performance of $C_{α}^{M}$ , the next subsection introduces a pretest-based confidence procedure which is similarly asymptotically valid but which in addition provides more informative intervals when the underlying process has a unit root or a near unit root.

3.2. A Pretest-Based Confidence Procedure

To enhance the informativeness of the confidence procedure when there is a unit root, we use a pretest approach. The idea is to apply two different unit root tests sequentially to assess the proximity of

ρ

to unity and then implement different confidence intervals depending on the information about the location of

ρ

that emerges from these tests. More precisely, we propose the following level

1 - α

confidence interval of the form

\begin{matrix} C_{γ, α, N, T} & = & I \{T_{1} \leq - z_{γ_{1}}\} I \{T_{2} \leq - z_{γ_{2}}\} C_{α_{1}}^{M} + I \{T_{1} > - z_{γ_{1}}\} C_{γ_{1}, α_{2}}^{UR 1} \\ + I \{T_{1} \leq - z_{γ_{1}}\} I \{T_{2} > - z_{γ_{2}}\} C_{γ_{2}, α_{2}}^{UR 2}, \end{matrix}

(6)

where

C_{α_{1}}^{M}

is as defined in (5) above,

\begin{matrix} C_{γ_{1}, α_{2}}^{UR 1} & = & \{ρ \in (- 1, 1] : 1 - \frac{\sqrt{2} (z_{γ_{1}} + z_{α_{2}})}{T \sqrt{N}} \leq ρ \leq 1\}, and \end{matrix}

(7)

\begin{matrix} C_{γ_{2}, α_{2}}^{UR 2} & = & \{ρ \in (- 1, 1] : 1 - \frac{2 (z_{γ_{2}} + z_{α_{2}})}{\sqrt{N T}} \leq ρ \leq 1\} \end{matrix}

(8)

and where

γ = (γ_{1}, γ_{2})

,

α = α_{1} + α_{2}

,

I

is an indicator function, and we again take

z_{γ_{1}}

to be the

1 - γ_{1}

quantile of a standard normal distribution for some

γ_{1} \in (0, 0.5]

, with

z_{γ_{2}}

and

z_{α_{2}}

similarly defined. In addition, we take

T_{1} = \frac{M_{y y}^{1 / 2} ({\hat{ρ}}_{pols} - 1)}{\hat{σ}},

with

M_{y y} = \sum_{i = 1}^{N} \sum_{t = 2}^{T} {(y_{i t - 1} - {\bar{y}}_{- 1, N T})}^{2}

, to be the unit root test statistic based on the POLS estimator; and

T_{2} = {\hat{ω}}_{IVL} ({\hat{ρ}}_{IVL} - 1),

with

{\hat{ω}}_{IVL} = {\hat{σ}}^{- 2} N^{- 1 / 2} T^{- 1 / 2} \sum_{i = 1}^{N} \sum_{t = 3}^{T} Δ y_{i t - 1} y_{i t - 1}

, is a unit root test statistic based on the IV estimator

{\hat{ρ}}_{IVL} = {[\sum_{i = 1}^{N} \sum_{t = 3}^{T} Δ y_{i t - 1} y_{i t - 1}]}^{- 1} \sum_{i = 1}^{N} \sum_{t = 3}^{T} Δ y_{i t - 1} y_{i t}

which was introduced by (Arellano and Bover 1995) and further analyzed in (Blundell and Bond 1998). From expression (6), it is apparent that the confidence procedure follows a sequential tree structure. We first pretest for the presence of a unit root using

T_{1}

. If the result of this first test fails to reject the unit root null hypothesis, then we employ the tighter unit root interval

C_{γ_{1}, α_{2}}^{UR 1} .

Otherwise, we conduct a second test of the unit root null hypothesis using a less powerful test

T_{2}

. If this second test fails to reject the null hypothesis, we use the wider unit root interval

C_{γ_{2}, α_{2}}^{UR 2}

. On the other hand, if both tests reject the unit root null hypothesis, we then use the interval

C_{α_{1}}^{M}

, which is asymptotically valid but less informative unless the true value of

ρ

is sufficiently far away from unity.

The next theorem shows that this confidence procedure is asymptotically valid in the sense that its non-converage probability is at most the nominal significance level

α

uniformly over the parameter space under pathwise asymptotics.

Theorem 3.

Let

α \in (0, 0.5]

be the specified significance level and let

N, T \to \infty

such that

N^{κ} / T = τ

for constants

κ \in (0, \infty)

and

τ \in (0, \infty)

. Set

N = N (T) = {(τ T)}^{1 / κ}

and

C_{γ, α, N, T} = C_{γ, α, N (T), T} = C_{α, T}

. Suppose that Assumptions 1, 3, 4, and either 2 or 2* hold. Then,

lim sup_{T \to \infty} sup_{ρ \in (- 1, 1]} Pr (ρ \notin C_{γ_{,} α, T} | ρ) \leq α .

Remark 2.

(i): The pre-test based confidence procedure proposed here is inspired by the work of (Lepski 1999) who used information from a test procedure to increase the accuracy of confidence sets. The original Lepski paper and subsequent extensions of that paper focused on problems of nonparametric function estimation and canonical versions of such problems, as represented by the many normal means model. Because we deal with a model that differs from the one studied in (Lepski 1999) and because we use a dual pre-test framework, the construction and analysis of our procedure also differ, even though we use the same idea to improve set estimation accuracy.
(ii): Since

$\begin{matrix} lim sup_{T \to \infty} sup_{ρ \in (- 1, 1]} Pr (ρ \notin C_{γ, α, T} | ρ) & = & lim sup_{T \to \infty} [1 - inf_{ρ \in (- 1, 1]} Pr (ρ \in C_{γ, α, T} | ρ)] \\ = & 1 - lim inf_{T \to \infty} inf_{ρ \in (- 1, 1]} Pr (ρ \in C_{γ, α, T} | ρ), \end{matrix}$

it follows that the result obtained in Theorem 3, i.e., $lim {sup}_{T \to \infty} {sup}_{ρ \in (- 1, 1]} Pr (ρ \notin C_{γ, α, T} | ρ) \leq α$ , is equivalent to $lim {inf}_{T \to \infty} inf_{ρ \in (- 1, 1]} Pr (ρ \in C_{γ, α, T} | ρ) \geq 1 - α$ , so that the proposed confidence interval has asymptotic coverage probability that is at least the nominal level $1 - α$ uniformly over $ρ \in (- 1, 1]$ .
(iii): In the procedure given by (6), $α_{1}$ is the significance level for the confidence interval $C_{α_{1}}^{M}$ . It is, of course, also the asymptotic non-coverage probability of $C_{α_{1}}^{M}$ , since $C_{α_{1}}^{M}$ is asymptotically valid.
(iv): As noted in the Introduction and in Remark 3.1 (v) above, a drawback of $C_{α_{1}}^{M}$ is that its width shrinks slowly for parameter sequences that are very close to unity. The pre-test confidence procedure seeks to improve on this rate by applying two different unit root tests sequentially and by using the information from these tests to determine whether to use local-to-unity intervals whose width shrinks at a faster rate than $C_{α_{1}}^{M}$ when the autoregressive parameter value is in close proximity of unity. To see how this improvement is achieved, note that when the true parameter value is within an $N^{- 1 / 2} T^{- 1}$ neighborhood of unity then, aside from the relatively small probability event of a Type I error, the first unit root test $T_{1}$ will fail to reject $H_{0} : ρ = 1$ , resulting in the use of the interval $C_{γ_{1}, α_{2}}^{UR 1} .$ When the parameter is this close to unity, wid $(C_{γ_{1}, α_{2}}^{UR 1}) = O_{p} (N^{- 1 / 2} T^{- 1})$ whereas wid $(C_{α_{1}}^{M}) = O_{p} (T^{- 1 / 2})$ , so that the use of $C_{γ_{1}, α_{2}}^{UR 1}$ leads to significant improvement over $C_{α_{1}}^{M}$ . The reason for a second unit root test using the statistic $T_{2}$ is that for parameter sequences $ρ_{T} = exp \{- 1 / q \{T\}\}$ such that $max \{T, \sqrt{N T}\} ≪ q (T) ≪ \sqrt{N} T$ , the first unit root test $T_{1}$ will reject $H_{0}$ with probability approaching one as sample sizes grow, but the less powerful unit root test based on $T_{2}$ will not, subject again to the relatively small probability event of a Type I error. For parameter sequences in this region, wid $(C_{α_{1}}^{M}) = O_{p} (N^{- 1 / 2} T^{- 3 / 2} q (T)) .$ The result is that we can make further improvement by using the interval $C_{γ_{2}, α_{2}}^{UR 2}$ which has width wid $(C_{γ_{2}, α_{2}}^{UR 2}) = O_{p} (N^{- 1 / 2} T^{- 1 / 2}) = o_{p} (N^{- 1 / 2} T^{- 3 / 2} q (T))$ . Finally, if both these unit root tests reject $H_{0}$ , then our procedure will infer that the parameter is far enough away from unity to use $C_{α_{1}}^{M}$ . Of course, the two unit root tests are subject to Type II errors; but, as explained in Remark 3.2(vi) below, the probability of Type II errors could also be properly controlled under our procedure8.
(v): $γ_{1}$ and $γ_{2}$ , on the other hand, are the significance levels for the unit root tests based on $T_{1}$ and $T_{2}$ . Note that, especially in large samples, the specification of $γ_{1}$ and $γ_{2}$ really has more of an impact on the width of the resulting interval than it does on the coverage probability, so that $γ_{1}$ and $γ_{2}$ are not significance levels in the traditional sense. For example, consider the choice of $γ_{1}$ . Observe that a smaller value of $γ_{1}$ leads to a wider $C_{γ_{1}, α_{2}}^{UR 1}$ . However, the effect of $γ_{1}$ on the width of the interval adopted by the overall procedure could be ambiguous, since, if the null hypothesis of an exact unit root is true, an increase in $γ_{1}$ would reduce the width of $C_{γ_{1}, α_{2}}^{UR 1}$ but could also lead to a greater chance that $T_{1}$ will falsely reject the null hypothesis and switch to either $C_{γ_{2}, α_{2}}^{UR 2}$ or $C_{α_{1}}^{M}$ , both of which are wider than $C_{γ_{1}, α_{2}}^{UR 1}$ in large samples. A similar argument shows that it is also difficult to predict a priori the effect of varying $γ_{2}$ on the width of the resulting interval. On the other hand, note that, except for pathological specifications where $γ_{1} = 0$ and/or $γ_{2} = 0$ (ruled out by our assumption), varying either $γ_{1}$ or $γ_{2}$ or both does not lead to a material distortion in the (asymptotic) coverage probability of the proposed procedure. To see why this is so, consider the case where the unit root specification is true. Then, even when both $γ_{1}$ and $γ_{2}$ are set to be large so that the null hypothesis is falsely rejected with high probability leading to the use of $C_{α_{1}}^{M}$ , we will still end up with asymptotic coverage probability greater than the nominal level $1 - α$ since $C_{α_{1}}^{M}$ is asymptotically valid and, by design, $α_{1} < α = α_{1} + α_{2}$ . On the other hand, if the underlying process is stable, then both of the unit root tests will reject the null hypothesis withprobability approaching one asymptotically, as long as neither $γ_{1}$ nor $γ_{2}$ is set equal to zero, and our procedure will switch to $C_{α_{1}}^{M}$ which controls the asymptotic coverage probability properly.
(vi): Pre-testing leads to the possibility of errors whose probability needs to be controlled. In particular, there may be parameter sequences which lie just outside of $C_{γ_{1}, α_{2}}^{UR 1}$ , for which $T_{1}$ may fail to reject $H_{0} : ρ = 1$ even in large samples. In addition, there may be parameter sequences which lie just outside of $C_{γ_{2}, α_{2}}^{UR 2}$ , for which $H_{0}$ is rejected by $T_{1}$ but for which $T_{2}$ may not reject $H_{0}$ even in large samples. In both of these scenarios, there is the possibility that none of our intervals will cover the true parameter sequence. However, in the proof of Lemma A1 given in the Appendix SB of the technical supplement, we show that, under our procedure, the probability of committing such Type II errors can be no greater than $α_{2}$ asymptotically 9. Hence, by constructing $C_{γ_{1}, α_{2}}^{UR 1}$ and $C_{γ_{2}, α_{2}}^{UR 2}$ in the manner suggested above, we can properly control the probability of not switching to $C_{α_{1}}^{M}$ when it is preferable to make that switch. In consequence, the asymptotic non-coverage probability under our procedure is always less than or equal to $α = α_{1} + α_{2}$ . Given a particular significance level α, different combinations of $α_{1}$ and $α_{2}$ involve trade-offs where a smaller $α_{2}$ leads to a smaller probability of committing a Type II error but also leadsto a larger $α_{1}$ and, thus, to $C_{α_{1}}^{M}$ having a smaller asymptotic coverage probability.
(vii): An advantage of our pretest based confidence procedure is its computational simplicity, as it is given in analytical form and, thus, does not require the use of bootstrap or other types of simulation-based methods for its computation. Moreover, the fact that $C_{α_{1}}^{M}$ , the interval used under our procedure in the stable case, is based on the Anderson–Hsiao procedure has the further benefit that its validity does not depend on imposing the assumption of mean stationarity of the initial condition. Hence, the design of our procedure has taken into consideration certain trade-offs on the competing goals of interval accuracy, computational simplicity, and the relaxation of the assumption of initial condition stationarity.

4. Monte Carlo Study

This section reports the results of a Monte Carlo study comparing the finite sample performance of alternative confidence procedures. For the simulation study, we consider data generating processes of the form

\begin{matrix} y_{i t} & = & a_{i} + w_{i t}, \\ w_{i t} & = & ρ w_{i t - 1} + ε_{i t}, for i = 1, \dots, N and t = 1, \dots, T; \end{matrix}

where

\{ε_{i t}\} \equiv i . i . d .

N (0, 1)

and

\{a_{i}\} \equiv i . i . d .

N (2, 1)

. We vary

ρ = 1.00

,

0.99

,

0.95

,

0.90

,

0.80

, and

0.60

and

w_{i 0} = 0,

2. In addition, we let

N = 100,

200. When

N = 100

, we take

T = 50

, 100; and when

N = 200

, we consider

T = 100

, 200. We take

α = 0.05

throughout, so that the (nominal) confidence level is always kept at

95 %

. Four versions of the pre-test based confidence interval (PCI) given by expression (6) above are considered, with different specifications of

γ_{1}

,

γ_{2}

,

α_{1}

, and

α_{2}

, as summarized in the following tables.

Table 1, Table 2, Table 3, Table 4, Table 5, Table 6, Table 7, Table 8, Table 9, Table 10, Table 11 and Table 12 below provide simulation results comparing the four PCI procedures described above with the

C_{0.05}^{M}

procedure given in (5) and with confidence intervals obtained by inverting Studentized statistics associated with the POLS and IVD estimators. More specifically, Table 1, Table 2, Table 3 and Table 4 give the empirical coverage probabilities while Table 5, Table 7, Table 9 and Table 11 report the average width of the confidence intervals under each of forty-eight experimental settings, obtained by varying

ρ

, N, T, and

w_{i 0}

. In addition, in Table 6, Table 8, Table 10 and Table 12 we report the number of instances out of 10,000 simulationrepetitions that a particular confidence procedure leads to an empty interval, which occurs when the intersection of the (unrestricted) interval and the parameter space is the null set. For example, in the case of the

C_{0.05}^{M}

procedure, an empty interval would arise if

\begin{matrix} \{ρ \in (- 1, 1] : - z_{α / 2} \leq M (ρ) \leq z_{α / 2}\} & = & (- 1, 1] \cap \{ρ \in R : - z_{α / 2} \leq M (ρ) \leq z_{α / 2}\} \\ = & \emptyset . \end{matrix}

Glancing at Table 1, Table 2, Table 3 and Table 4, we see that, consistent with our theory, the empirical coverage probabilities of the

C_{0.05}^{M}

procedure show the greatest degree of uniformity across different experiments. On the other hand, all four PCIs have empirical coverage probabilities that are uniformly better than the

C_{0.05}^{M}

procedure across all forty-eight experiments. An intuitive explanation for this result can be given as follows. When the unit root null hypothesis is true, application of the pre-test procedure will lead to the use of either

C_{γ_{1}, α_{2}}^{UR 1}

or

C_{γ_{2}, α_{2}}^{UR 2}

, except in the small probability event where a Type I error is committed by both of the unit root tests

T_{1}

and

T_{2}

. Since both of these intervals cover the point

ρ = 1

by construction, the overall procedure in this case should cover this point with very high probability. On the other hand, when the unit root hypothesis is false, the pre-test procedure switches to the interval

C_{α_{1}}^{M}

but with

α_{1}

set at a level strictly less than

0.05

, resulting again in coverage probabilities which are greater than that of the

C_{0.05}^{M}

procedure.

	$γ_{1}$	$γ_{2}$	$α_{1}$	$α_{2}$
$C_{PCI 1}$	$0.01$	$0.01$	$0.025$	$0.025$
$C_{PCI 2}$	$0.01$	$0.01$	$0.049$	$0.001$
$C_{PCI 3}$	$0.05$	$0.05$	$0.025$	$0.025$
$C_{PCI 4}$	$0.05$	$0.05$	$0.049$	$0.001$

A possible deficiency of the

C_{0.05}^{M}

procedure as shown in Table 5, Table 7, Table 9 and Table 11 is that the average widths of intervals obtained from this procedure are substantially wider than that of the other procedures when

ρ = 1

. Moreover, in the

ρ = 1

case, the use of the

C_{0.05}^{M}

procedure results in empty intervals in roughly

2.61 %

of the times, ranging from a low of 215 empty intervals (out of

10, 000

repetitions) in the case with

N = 100, T = 50

, and

w_{i 0} = 0

to a high of 295 empty intervals (out of

10, 000

repetitions) in the case where

N = 100, T = 100

, and

w_{i 0} = 2

10. In contrast, no empty interval is observed for any of the pre-test procedures in any of the 48 experiments, including experiments where

ρ = 1

. It should also be noted that, outside of the unit root case, the results of Table 5, Table 6, Table 7, Table 8, Table 9, Table 10, Table 11 and Table 12 do show

C_{0.05}^{M}

to provide informative intervals with average widths that are much smaller than those in the

ρ = 1

case. In addition, as the true value of

ρ

moves significantly away from unity, such as in the cases where

ρ = 0.95

,

0.9

,

0.8

, and

0.6

; empty intervals were no longer observed for

C_{0.05}^{M}

.

For the four alternative specifications of PCI, there does not seem to be a great deal of difference in their performance across the experiments, although some minor trade-offs in coverage probability vis-à-vis average interval width can be discerned. For example, looking at PCI1, we see that this procedure provides very tight intervals in the case where

ρ = 1

. In fact, the average interval width for this procedure in the unit root case is ≤

0.0070

, except in the smaller sample size case with

N = 100

and

T = 50

, where it is still around

0.0133

. Moreover, amongst the seven procedures examined in our study, the empirical coverage probability of PCI1 is the highest, or is at least tied for the highest, almost across the board, for the 48 experiments whose results are reported in Table 1, Table 2, Table 3 and Table 4. Although the higher coverage probability of PCI1 in the stable region is due at least in part to the fact that it is designed to be conservative with

α_{1} = 0.025

when the true process is stable, it should be noted that the informativeness of PCI1, as measured by its average width, does not seem to have suffered significantly as a result. Note, in particular, that over the 48 experiments the widest average interval width recorded for PCI1 was only

0.1446

, or approximately

7 %

of the width of the entire parameter space

(- 1, 1]

; and this occurred with the smaller sample sizes of

N = 100

and

T = 50

. In addition, PCI1 has average width strictly less than

0.1

in 38 of the experiments. On the other hand, PCI2 sets

α_{1} = 0.049

and is, thus, less conservative relative to PCI1, particularly in the stable region. In consequence, PCI2 tends to have not only smaller interval widths but also lower coverage probabilities relative to PCI1 when the underlying process is stable. The results for PCI1 and PCI2 are illustrative of how the pre-test procedures can greatly improve upon

C_{0.05}^{M}

in terms of accuracy in the unit root and near unit root cases while maintaining coverage probability at a high level throughout the parameter space, with the only downside being that they yield slightly wider intervals when the true process is stable.

Table 1, Table 2, Table 3 and Table 4 also show that confidence intervals constructed by inverting Studentized statistics associated with

{\hat{ρ}}_{POLS}

and

{\hat{ρ}}_{IVD}

are decidedly inferior to the pre-test based confidence procedures. Consistent with our theory, Table 1, Table 2, Table 3 and Table 4 show that these confidence intervals have highly non-uniform coverage probabilities across different (true) parameter values

ρ

. More specifically, the coverage probabilities of the IV-based confidence intervals are especially poor when

ρ

is unity or near-unity, whereas the coverage probabilities for the POLS-based confidence intervals begin to deviate dramatically from the nominal level when

ρ = 0.95

or less. Moreover, from the results reported in Table 6, Table 8, Table 10 and Table 12, we note that

C I_{IVD}

, the confidence procedure based on inverting the Studentized statistic associated with

{\hat{ρ}}_{IVD}

, leads to an empty interval in more than

40 %

of the simulation runs when

ρ = 1

. This is perhaps not surprising since, as shown in Theorem SA-1 in Appendix SA of the supplement to this paper,

{\hat{ρ}}_{IVD}

is not uniformly convergent over the parameter space and does not have an asymptotic normal distribution when the true

ρ

equals unity. Hence, when

ρ = 1

, the

C I_{IVD}

procedure, which is designed to achieve the correct asymptotic coverage in the stationary case, will not only exhibit poor coverage probabilities but will often deliver intervals that lie entirely outside the parameter space. Interestingly, even though the

C I_{POLS}

procedure is based on the correct asymptotics when

ρ = 1

, it nevertheless produces some empty intervals in the unit root case as shown in Table 6, Table 8, Table 10 and Table 12. This suggests a need to modify the usual t-ratio based confidence procedure in cases where there is interest in a point on the boundary of a bounded parameter space, such as

ρ = 1

.

5. Conclusions

The uniform inference procedure proposed here utilizes information from pretesting the unit root hypothesis to aid the construction of confidence intervals in panel autoregression by means of data-based selection among intervals that are well suited to particular regions of the parameter space. The construction is asymptotically valid in the sense that the large sample coverage probability is at least that of the nominal level uniformly over a wide parameter space that includes unity. The method is particularly simple to implement in practical work and simulations provide encouraging evidence that the method produces confidence intervals with good finite sample accuracy, as measured by the combination of empirical coverage probability and average interval width. The panel AR model considered here is a simple model. But it is the kernel of all dynamic panel models and embodies all the characteristics that make uniform inference and confidence interval construction difficult. Even in the time series case these problems are well known to be challenging. In the panel case, the challenges are accentuated by additional issues arising from the presence of incidental effects and multi-index limit theory. The pre-test confidence interval solution proposed here addresses these challenges and has potential for application in more complex models.

Supplementary Materials

The following are available at https://www.mdpi.com/2225-1146/7/4/45/s1.

Author Contributions

Conceptualization: J.C.C. 70% and P.C.B.P. 30%; Methodology: J.C.C. 80% and P.C.B.P. 20%; Software: J.C.C. 100% and P.C.B.P. 0%; Writing: J.C.C. 60% and P.C.B.P. 40%; Editing: J.C.C. 50% and P.C.B.P. 50%.

Funding

This research received no external funding.

Acknowledgments

We thank Chris Sims for a thought-provoking comment which led us to a substantial improvement in some of our results. We also thank participants at the 2017 Greater New York Metropolitan Area Econometrics Colloquium, the New York Camp Econometrics XI, the 2016 Australasia Meeting of the Econometric Society, and the 2016 NBER/NSF Time Series Conference, as well as workshop participants at Boston College, Indiana University Bloomington, Pennsylvania State University, and the University of Pennsylvania for comments. Phillips acknowledges support from the NSF under Grant SES 1258258, the Kelly Fund at the University of Auckland, and an LKC Fellowship at Singapore Management University.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Proofs of the Main Results

The proofs given here rely on a large number of technical results that are established in the Technical Supplement (Chao and Phillips 2019). These results are designated in the derivations that follow by use of the prefix S. Lemmas A1 and A2 are stated in Appendix A and their proofs are given in the Technical Supplement. The proofs rely on functional limit theory for integrated and near integrated processes in conjunction with joint limit theory arguments for multi-indexed asymptotics (Phillips 1987a, 1987b; Phillips and Moon 1999).

Proof of Theorem 1.

Let

Δ ε_{i t} (ρ_{T}) = Δ y_{i t} - ρ_{T} Δ y_{i t - 1}

, and note that

\begin{matrix} M (ρ_{T}) & = & \frac{1}{\hat{ω} \sqrt{N T}} \sum_{i = 1}^{N} \sum_{t = 3}^{T} y_{i t - 2} (Δ y_{i t} - ρ_{T} Δ y_{i t - 1}) \\ = & \frac{1}{\hat{ω} \sqrt{N T}} \sum_{i = 1}^{N} \sum_{t = 3}^{T} a_{i} Δ ε_{i t} (ρ_{T}) + \frac{1}{\hat{ω} \sqrt{N T}} \sum_{i = 1}^{N} \sum_{t = 3}^{T} w_{i t - 2} Δ ε_{i t} (ρ_{T}) . \end{matrix}

Applying partial summation, we have

\begin{matrix} \frac{1}{\hat{ω} \sqrt{N T}} \sum_{i = 1}^{N} \sum_{t = 3}^{T} w_{i t - 2} Δ ε_{i t} (ρ_{T}) \\ = & \frac{1}{\hat{ω} \sqrt{N T}} \sum_{i = 1}^{N} [\sum_{t = 4}^{T} (w_{i t - 3} - w_{i t - 2}) ε_{i t - 1} + w_{i T - 2} ε_{i T} - w_{i 1} ε_{i 2}] \\ = & (1 - exp \{- \frac{1}{q (T)}\}) \frac{1}{\hat{ω} \sqrt{N T}} \sum_{i = 1}^{N} \sum_{t = 4}^{T} w_{i t - 3} ε_{i t - 1} - \frac{1}{\hat{ω} \sqrt{N T}} \sum_{i = 1}^{N} \sum_{t = 4}^{T} ε_{i t - 2} ε_{i t - 1} \\ + \frac{1}{\hat{ω} \sqrt{N T}} \sum_{i = 1}^{N} w_{i T - 2} ε_{i T} - \frac{1}{\hat{ω} \sqrt{N T}} \sum_{i = 1}^{N} w_{i 1} ε_{i 2}, \end{matrix}

so that

\begin{matrix} M (ρ_{T}) & = & - \frac{1}{\hat{ω} \sqrt{N T}} \sum_{i = 1}^{N} \sum_{t = 4}^{T} ε_{i t - 2} ε_{i t - 1} + \frac{1}{\hat{ω} \sqrt{N T}} \sum_{i = 1}^{N} w_{i T - 2} ε_{i T} \\ + (1 - ρ_{T}) \frac{1}{\hat{ω} \sqrt{N T}} \sum_{i = 1}^{N} \sum_{t = 4}^{T} w_{i t - 3} ε_{i t - 1} + \frac{1}{\hat{ω} \sqrt{N T}} \sum_{i = 1}^{N} \sum_{t = 2}^{T} a_{i} Δ ε_{i t} - \frac{1}{\hat{ω} \sqrt{N T}} \sum_{i = 1}^{N} w_{i 1} ε_{i 2} . \end{matrix}

We turn first to part (a). In this case, by assumption,

ρ_{T} = 1

for all T sufficiently large. Under the random-effects specification given by Assumption 2, we can apply parts (g) and (i) of Lemma SD-11, part (a) of Lemma SD-25, and part (a) of Lemma SC-13 to obtain

\begin{matrix} M (ρ_{T}) & = & - \frac{1}{\hat{ω} \sqrt{N T}} \sum_{i = 1}^{N} \sum_{t = 4}^{T} ε_{i t - 2} ε_{i t - 1} + \frac{1}{\hat{ω} \sqrt{N T}} \sum_{i = 1}^{N} w_{i T - 2} ε_{i T} \\ + (1 - ρ_{T}) \frac{1}{\hat{ω} \sqrt{N T}} \sum_{i = 1}^{N} \sum_{t = 4}^{T} w_{i t - 3} ε_{i t - 1} + \frac{1}{\hat{ω} \sqrt{N T}} \sum_{i = 1}^{N} \sum_{t = 2}^{T} a_{i} Δ ε_{i t} - \frac{1}{\hat{ω} \sqrt{N T}} \sum_{i = 1}^{N} w_{i 1} ε_{i 2} \\ = & - \frac{1}{\hat{ω} \sqrt{N T}} \sum_{i = 1}^{N} \sum_{t = 4}^{T} ε_{i t - 2} ε_{i t - 1} + \frac{1}{\hat{ω} \sqrt{N T}} \sum_{i = 1}^{N} w_{i T - 2} ε_{i T} + O_{p} ((1 - ρ_{T}) \sqrt{T}) \\ + O_{p} (\frac{1}{\sqrt{T}}) \\ = & - \frac{1}{σ^{2} \sqrt{2}} \frac{1}{\sqrt{N T}} \sum_{i = 1}^{N} \sum_{t = 4}^{T} ε_{i t - 2} ε_{i t - 1} + \frac{1}{σ^{2} \sqrt{2}} \frac{1}{\sqrt{N T}} \sum_{i = 1}^{N} w_{i T - 2} ε_{i T} + o_{p} (1) . \end{matrix}

It follows from applying Lemma SD-24 that

M (ρ_{T}) \Rightarrow N (0, 1)

, as required. Moreover, it is easily seen that, by applying part (b) of Lemma SE-1 in lieu of part (g) of Lemma SD-11 in the argument given above, the same result can be obtained under the fixed-effects specification given by Assumption 2*.

Next consider part (b), where we take

ρ_{T} = exp \{- 1 / q (T)\}

such that

T / q (T) \to 0

. In the case of the random-effects specification given by Assumption 2, we can use the results in parts (g) and (i) of Lemma SD-11, part (b) of Lemma SD-25, and part (b) of Lemma SC-13 to deduce that

\begin{matrix} M (ρ_{T}) & = & - \frac{1}{\hat{ω} \sqrt{N T}} \sum_{i = 1}^{N} \sum_{t = 4}^{T} ε_{i t - 2} ε_{i t - 1} + \frac{1}{\hat{ω} \sqrt{N T}} \sum_{i = 1}^{N} w_{i T - 2} ε_{i T} + O_{p} (\frac{\sqrt{T}}{q (T)}) + O_{p} (\frac{1}{\sqrt{T}}) \\ = & - \frac{1}{σ^{2} \sqrt{2}} \frac{1}{\sqrt{N T}} \sum_{i = 1}^{N} \sum_{t = 4}^{T} ε_{i t - 2} ε_{i t - 1} + \frac{1}{σ^{2} \sqrt{2}} \frac{1}{\sqrt{N T}} \sum_{i = 1}^{N} w_{i T - 2} ε_{i T} + o_{p} (1) . \end{matrix}

It follows from part (a) of Lemma SD-22 that

M (ρ_{T}) \Rightarrow N (0, 1)

. Moreover, it is easily seen that, by applying part (b) of Lemma SE-1 in lieu of part (g) of Lemma SD-11 in the argument given above, the same result can be obtained under the fixed-effects specification given by Assumption 2*.

Consider part (c), where we take

ρ_{T} = exp \{- 1 / q (T)\}

such that

q (T) \sim T

. Under the random-effects specification given by Assumption 2, we can apply parts (g) and (i) of Lemma SD-11, part (c) of Lemma SD-25, and part (c) of Lemma SC-13 to deduce that

\begin{matrix} M (ρ_{T}) & = & - \frac{1}{\hat{ω} \sqrt{N T}} \sum_{i = 1}^{N} \sum_{t = 4}^{T} ε_{i t - 2} ε_{i t - 1} + \frac{1}{\hat{ω} \sqrt{N T}} \sum_{i = 1}^{N} w_{i T - 2} ε_{i T} + O_{p} (\frac{1}{\sqrt{T}}) \\ - \frac{1}{{\bar{ω}}_{T} \sqrt{N T}} \sum_{i = 1}^{N} \sum_{t = 4}^{T} ε_{i t - 2} ε_{i t - 1} + \frac{1}{{\bar{ω}}_{T} \sqrt{N T}} \sum_{i = 1}^{N} w_{i T - 2} ε_{i T} + o_{p} (1), \end{matrix}

where

{\bar{ω}}_{T} = σ^{2} {\{1 + [q (T) / 2 T] [1 - exp \{- 2 T / q (T)\}]\}}^{1 / 2}

. It follows from part (b) of Lemma SD-22 that

M (ρ_{T}) \Rightarrow N (0, 1)

. Moreover, it is easily seen that, by applying part (b) of Lemma SE-1 in lieu of part (g) of Lemma SD-11 in the argument given above, the same result can be obtained under the fixed-effects specification given by Assumption 2*.

For part (d), we consider the case where

ρ_{T} = exp \{- 1 / q (T)\}

such that

q (T) \to \infty

but

q (T) / T \to 0

. Here, we first apply part (d) of Lemma SC-13 and part (d) of Lemma SD-21 to obtain

\frac{1}{\hat{ω} \sqrt{N T}} \sum_{i = 1}^{N} w_{i T - 2} ε_{i T} = \frac{1}{σ^{2} \sqrt{N T}} \sum_{i = 1}^{N} w_{i T - 2} ε_{i T} [1 + o_{p} (1)] = O_{p} (\sqrt{\frac{q (T)}{T}}) .

Under the random-effects specification given by Assumption 2, we further apply parts (g) and (i) of Lemma SD-11 and part (d) of Lemma SD-25 to obtain

\begin{matrix} M (ρ_{T}) & = & - \frac{1}{\hat{ω} \sqrt{N T}} \sum_{i = 1}^{N} \sum_{t = 4}^{T} ε_{i t - 2} ε_{i t - 1} + O_{p} (\sqrt{\frac{q (T)}{T}}) + O_{p} (\frac{1}{\sqrt{q (T)}}) + O_{p} (\frac{1}{\sqrt{T}}) \\ = & - \frac{1}{σ^{2} \sqrt{N T}} \sum_{i = 1}^{N} \sum_{t = 4}^{T} ε_{i t - 2} ε_{i t - 1} + o_{p} (1) . \end{matrix}

By part (c) of Lemma SD-22, we then deduce that

M (ρ_{T}) \Rightarrow N (0, 1)

, as required for (d). Moreover, it is easily seen that, by applying part (b) of Lemma SE-1 in lieu of part (g) of Lemma SD-11 in the argument given above, the same result can be obtained under the fixed-effects specification given by Assumption 2*.

Finally, to show part (e), we first consider the random-effects specification given by Assumption 2. In this case, note that, by applying parts (g) and (i) of Lemma SD-11, part (e) of Lemma SD-21, and part (e) of Lemma SC-13, we obtain

\begin{matrix} \frac{1}{\hat{ω} \sqrt{N T}} \sum_{i = 1}^{N} \sum_{t = 3}^{T} w_{i t - 2} Δ ε_{i t} (ρ_{T}) \\ = & - \frac{1}{\hat{ω} \sqrt{N T}} \sum_{i = 1}^{N} \sum_{t = 4}^{T} ε_{i t - 2} ε_{i t - 1} + \frac{(1 - ρ_{T})}{\hat{ω} \sqrt{N T}} \sum_{i = 1}^{N} \sum_{t = 4}^{T} w_{i t - 3} ε_{i t - 1} + O_{p} (\frac{1}{\sqrt{T}}) \\ = & \sqrt{\frac{1 + ρ_{T}}{2 σ^{4}}} \frac{1}{\sqrt{N}} \sum_{i = 1}^{N} (X_{i, T} + Y_{i, T}) + o_{p} (1), \end{matrix}

where

X_{i, T} = - T^{- 1 / 2} \sum_{t = 4}^{T} ε_{i t - 2} ε_{i t - 1}

and

Y_{i, T} = (1 - ρ_{T}) T^{- 1 / 2} \sum_{t = 4}^{T} w_{i t - 3} ε_{i t - 1}

. It follows by Lemma SD-23 that

M (ρ_{T}) = \sqrt{(1 + ρ_{T}) / (2 σ^{4})} N^{- 1 / 2} \sum_{i = 1}^{N} (X_{i, T} + Y_{i, T}) + o_{p} (1) \Rightarrow N (0, 1)

. Moreover, it is easily seen that, by applying part (b) of Lemma SE-1 in lieu of part (g) of Lemma SD-11 in the argument given above, the same result can be obtained under the fixed-effects specification given by Assumption 2*. □

Proof of Theorem 2.

To proceed, note that, in the pathwise asymptotics considered here, N grows as a monotonically increasing function of T, so that the asymptotics can be taken to be single-indexed with

T \to \infty

. Now, let

\{G_{j}^{M} : j = 1, 2, 3, 4, 5\}

be the collections of parameter sequences defined in the statement of Theorem 1. Moreover, let

\{ρ_{k, T}\} \in G_{s_{k}}^{M}

(for

k = 1, 2, \dots, 5

), i.e.,

\{ρ_{k, T}\}

is a sequence belonging to the collection

G_{s_{k}}^{M}

11. Define

T_{k} = f_{k} (T)

(k = 1, \dots, d)

, with

d \leq 5

, where

f_{k} (\cdot) :

N \to N

is an increasing function in its argument, and let

\{ρ_{k, T_{k}}\}

denote a subsequence of

\{ρ_{k, T}\}

. Note that every parameter sequence

ρ_{T} \in

(- 1, 1]

can be represented as

\{ρ_{T}\} = ⋃_{j = 1}^{d} \{ρ_{j, T_{j}}\}

, where

\{ρ_{1, T_{1}}\} \in G_{s_{1}}^{M}, \dots, \{ρ_{d, T_{d}}\} \in G_{s_{d}}^{M}

, with

G_{s_{k}}^{M} \neq G_{s_{ℓ}}^{M}

for

k \neq ℓ

and where

N = ⋃_{k = 1}^{d} \{T_{k} = f_{k} (T) : T \in N\}

, with

N

denoting the set of natural numbers.

Next, note that

Pr (ρ_{k, T} \notin C_{α_{1}, T}^{M} | ρ_{k, T}) = Pr (|M_{T} (ρ_{k, T})| > z_{α_{1} / 2} | ρ_{k, T})

. Theorem 1 implies that, for any

ε > 0

and for each

k \in \{1, \dots, d\}

, there exists positive integer

M_{k}

such that for every positive integer

T \geq M_{k}

,

|Pr (|M (ρ_{k, T})| > z_{α_{1} / 2} | ρ_{k, T}) - Pr (|Z| > z_{α_{1} / 2})| < ε,

where

Z \sim N (0, 1)

. Moreover, for any positive integer

T \geq M_{k}

, we have

T_{k} = f_{k} (T) \geq T \geq M_{k}

by Lemma SD-33 (given in Appendix SD in the technical supplement to this paper), from which we further deduce that

|Pr (|M (ρ_{k, T_{k}})| > z_{α_{1} / 2} | ρ_{k, T_{k}}) - Pr (|Z| > z_{α_{1} / 2})| < ε .

Next, let

\bar{M} = max \{f_{1} (M_{1}), \dots, f_{k} (M_{d})\}

. Consider any positive integer

T \geq \bar{M}

; we must have

T = f_{k} (T^{*})

for some

k = 1, \dots, d

and for some

T^{*} \in N

. Given that

T = f_{k} (T^{*}) \geq \bar{M} \geq f_{k} (M_{k})

, we also deduce that

T \geq T^{*} \geq M_{k}

by Lemma SD-33 since

f_{k} (\cdot)

is an increasing function of its argument. It follows that for every sequence

\{ρ_{T}\}

and for all

T \geq \bar{M}

\begin{matrix} |Pr (|M (ρ_{T})| > z_{α_{1} / 2} | ρ_{T}) - Pr (|Z| > z_{α_{1} / 2})| \\ = & |Pr (|M (ρ_{k, f_{k} (T^{*})})| > z_{α_{1} / 2} | ρ_{k, f_{k} (T^{*})}) - Pr (|Z| > z_{α_{1} / 2})| < ε . \end{matrix}

The desired result then follows from (Lepski 1999) Lemma 2.6.2. □

Lemma A1.

Let

ρ_{T} = exp \{- 1 / q (T)\}

, and suppose that Assumptions 1, 3, 4, and either 2 or 2* hold. Then, the following statements are true as

N, T \to \infty

such that

N^{κ} / T = τ

, for constants

κ \in (0, \infty)

and

τ \in (0, \infty)

.

(a): Let $G_{1}^{P} = \{\{ρ_{T}\} : ρ_{T} = 1 f o r a l l T s u f f i c i e n t l y l a r g e\}$ , and set $N = N (T) = {(τ T)}^{1 / κ}$ and $C_{γ, α, N, T} = C_{γ, α, N (T), T} = C_{γ, α, T}$ . Then, for $\{ρ_{T}\} \in G_{1}^{P}$ ,

$lim sup_{T \to \infty} Pr (ρ_{T} \notin C_{γ, α, T} | ρ_{T}) \leq α_{1} < α .$
(b): Let $G_{2}^{P} = \{\{ρ_{T}\} : ρ_{T} = exp \{\frac{1}{q (T)}\}, \sqrt{N} T ≪ q (T)\}$ , and set $N = N (T) = {(τ T)}^{1 / κ}$ and $C_{γ, α, N, T} = C_{γ, α, N (T), T} = C_{γ, α, T}$ . Then, for $\{ρ_{T}\} \in G_{2}^{P}$ ,

$lim sup_{T \to \infty} Pr (ρ_{T} \notin C_{γ, α, T} | ρ_{T}) \leq α_{1} < α .$
(c): Let

$\begin{matrix} G_{3}^{P} \\ = & \{\{ρ_{T}\} : ρ_{T} = exp \{\frac{1}{q (T)}\}, \sqrt{N} T \sim q (T), a n d ρ_{T} \geq 1 - \frac{(z_{γ_{1}} + z_{α_{2}}) \sqrt{2}}{\sqrt{N} T} e v e n t u a l l y\}, \end{matrix}$

and set $N = N (T) = {(τ T)}^{1 / κ}$ and $C_{γ, α, N, T} = C_{γ, α, N (T), T} = C_{γ, α, T}$ . Then, for $\{ρ_{T}\} \in G_{3}^{P}$ ,

$lim sup_{T \to \infty} Pr (ρ_{T} \notin C_{γ, α, T} | ρ_{T}) \leq α_{1} < α .$
(d): Let

$\begin{matrix} G_{4}^{P} \\ = & \{\{ρ_{T}\} : ρ_{T} = exp \{\frac{1}{q (T)}\}, \sqrt{N} T \sim q (T), a n d ρ_{T} < 1 - \frac{(z_{γ_{1}} + z_{α_{2}}) \sqrt{2}}{\sqrt{N} T} e v e n t u a l l y\}, \end{matrix}$

and set $N = N (T) = {(τ T)}^{1 / κ}$ and $C_{γ, α, N, T} = C_{γ, α, N (T), T} = C_{γ, α, T}$ . Then, for $\{ρ_{T}\} \in G_{4}^{P}$ ,

$lim sup_{T \to \infty} Pr (ρ_{T} \notin C_{γ, α, T} | ρ_{T}) \leq α_{1} + α_{2} = α .$
(e): Let $G_{5}^{P} = \{\{ρ_{T}\} : ρ_{T} = exp \{1 / q (T)\}, \sqrt{N T} ≪ q (T), a n d T ≪ q (T) ≪ \sqrt{N} T\}$ , and set $N = N (T) = {(τ T)}^{1 / κ}$ and $C_{γ, α, N, T} = C_{γ, α, N (T), T} = C_{γ, α, T}$ . Then, for $\{ρ_{T}\} \in G_{5}^{P}$ ,

$lim sup_{T \to \infty} Pr (ρ_{T} \notin C_{γ, α, T} | ρ_{T}) \leq α_{1} < α .$
(f): Let

$\begin{matrix} G_{6}^{P} \\ = & \{\{ρ_{T}\} : ρ_{T} = exp \{\frac{1}{q (T)}\}, T ≪ q (T) \sim \sqrt{N T}, a n d ρ_{T} \geq 1 - \frac{2 (z_{γ_{2}} + z_{α_{2}})}{\sqrt{N T}} e v e n t u a l l y\}, \end{matrix}$

and set $N = N (T) = {(τ T)}^{1 / κ}$ and $C_{γ, α, N, T} = C_{γ, α, N (T), T} = C_{γ, α, T}$ . Then, for $\{ρ_{T}\} \in G_{6}^{P}$ ,

$lim sup_{T \to \infty} Pr (ρ_{T} \notin C_{γ, α, T} | ρ_{T}) \leq α_{1} < α .$
(g): Let

$\begin{matrix} G_{7}^{P} \\ = & \{\{ρ_{T}\} : ρ_{T} = exp \{\frac{1}{q (T)}\}, T ≪ q (T) \sim \sqrt{N T}, a n d ρ_{T} < 1 - \frac{2 (z_{γ_{2}} + z_{α_{2}})}{\sqrt{N T}} e v e n t u a l l y\}, \end{matrix}$

and set $N = N (T) = {(τ T)}^{1 / κ}$ and $C_{γ, α, N, T} = C_{γ, α, N (T), T} = C_{γ, α, T}$ . Then, for $\{ρ_{T}\} \in G_{7}^{P}$ ,

$lim sup_{T \to \infty} Pr (ρ_{T} \notin C_{γ, α, T} | ρ_{T}) \leq α_{1} + α_{2} = α .$
(h): Let $G_{8}^{P} = \{\{ρ_{T}\} : ρ_{T} = exp \{\frac{1}{q (T)}\}, \sqrt{N T} ≪ q (T) \sim T\}$ , and set $N = N (T) = {(τ T)}^{1 / κ}$ and $C_{γ, α, N, T} = C_{γ, α, N (T), T} = C_{γ, α, T}$ . Then, for $\{ρ_{T}\} \in G_{8}^{P}$ ,

$lim sup_{T \to \infty} Pr (ρ_{T} \notin C_{γ, α, T} | ρ_{T} \in G_{8}^{P}) \leq α_{1} < α .$
(i): Let $G_{9}^{P} = \{\{ρ_{T}\} : ρ_{T} = exp \{1 / q (T)\}, \sqrt{N T} ≪ q (T) ≪ T\}$ , and set $N = N (T) = {(τ T)}^{1 / κ}$ and $C_{γ, α, N, T} = C_{γ, α, N (T), T} = C_{γ, α, T}$ . Then, for $\{ρ_{T}\} \in G_{9}^{P}$ ,

$lim sup_{T \to \infty} Pr (ρ_{T} \notin C_{γ, α, T} | ρ_{T}) \leq α_{1} < α .$
(j): Let

$\begin{matrix} G_{10}^{P} \\ = & \{\{ρ_{T}\} : ρ_{T} = exp \{\frac{1}{q (T)}\}, q (T) \sim \sqrt{N T} \sim T, a n d ρ_{T} \geq 1 - \frac{2 (z_{γ_{2}} + z_{α_{2}})}{\sqrt{N T}} e v e n t u a l l y\}, \end{matrix}$

and set $N = N (T) = τ T$ and $C_{γ, α, N, T} = C_{γ, α, N (T), T} = C_{γ, α, T}$ . Then, for $\{ρ_{T}\} \in G_{10}^{P}$ ,

$lim sup_{T \to \infty} Pr (ρ_{T} \notin C_{γ, α, T} | ρ_{T}) \leq α_{1} < α .$
(k): Let

$\begin{matrix} G_{11}^{P} \\ = & \{\{ρ_{T}\} : ρ_{T} = exp \{\frac{1}{q (T)}\}, q (T) \sim \sqrt{N T} \sim T, a n d ρ_{T} < 1 - \frac{2 (z_{γ_{2}} + z_{α_{2}})}{\sqrt{N T}} e v e n t u a l l y\}, \end{matrix}$

and set $N = N (T) = τ T$ and $C_{γ, α, N, T} = C_{γ, α, N (T), T} = C_{γ, α, T}$ . Then, for $\{ρ_{T}\} \in G_{11}^{P}$ ,

$lim sup_{T \to \infty} Pr (ρ_{T} \notin C_{γ, α, T} | ρ_{T}) \leq α_{1} + α_{2} = α .$
(l): Let

$\begin{matrix} G_{12}^{P} \\ = & \{\{ρ_{T}\} : ρ_{T} = exp \{\frac{1}{q (T)}\}, q (T) \sim \sqrt{N T} ≪ T, a n d ρ_{T} \geq 1 - \frac{2 (z_{γ_{2}} + z_{α_{3}})}{\sqrt{N T}} e v e n t u a l l y\}, \end{matrix}$

and set $N = N (T) = {(τ T)}^{1 / κ}$ and $C_{γ, α, N, T} = C_{γ, α, N (T), T} = C_{γ, α, T}$ . Then, for $\{ρ_{T}\} \in G_{12}^{P}$ ,

$lim sup_{T \to \infty} Pr (ρ_{T} \notin C_{γ, α, T} | ρ_{T}) \leq α_{1} < α .$
(m): Let

$\begin{matrix} G_{13}^{P} \\ = & \{\{ρ_{T}\} : ρ_{T} = exp \{\frac{1}{q (T)}\}, q (T) \sim \sqrt{N T} ≪ T, a n d ρ_{T} < 1 - \frac{2 (z_{γ_{2}} + z_{α_{2}})}{\sqrt{N T}} e v e n t u a l l y\}, \end{matrix}$

and set $N = N (T) = {(τ T)}^{1 / κ}$ and $C_{γ, α, N, T} = C_{γ, α, N (T), T} = C_{γ, α, T}$ . Then, for $ρ_{T} \in G_{13}^{P}$ ,

$lim sup_{T \to \infty} Pr (ρ_{T} \notin C_{γ, α, T} | ρ_{T}) \leq α_{1} + α_{2} = α .$
(n): Let

$G_{14}^{P} = \{\{ρ_{T}\} : ρ_{T} = exp \{\frac{1}{q (T)}\}, T ≪ q (T) ≪ \sqrt{N T} a n d \sqrt{N} ≪ q (T)\},$

and set $N = N (T) = {(τ T)}^{1 / κ}$ and $C_{γ, α, N, T} = C_{γ, α, N (T), T} = C_{γ, α, T}$ . Then, for $\{ρ_{T}\} \in G_{14}^{P}$ ,

$lim sup_{T \to \infty} Pr (ρ_{T} \notin C_{γ, α, T} | ρ_{T}) \leq α_{1} < α .$
(o): Let

$G_{15}^{P} = \{\{ρ_{T}\} : ρ_{T} = exp \{\frac{1}{q (T)}\}, T ≪ q (T), N^{1 / 4} T^{1 / 4} ≪ q (T), a n d q (T) / \sqrt{N} = O (1)\},$

and set $N = N (T) = {(τ T)}^{1 / κ}$ and $C_{γ, α, N, T} = C_{γ, α, N (T), T} = C_{γ, α, T}$ . Then, for $\{ρ_{T}\} \in G_{15}^{P}$ ,

$lim sup_{T \to \infty} Pr (ρ_{T} \notin C_{γ, α, T} | ρ_{T}) \leq α_{1} < α .$
(p): Let $G_{16}^{P} = \{\{ρ_{T}\} : (T ≪ q (T)) \cap (q (T) / N^{1 / 4} T^{1 / 4} = O (1))\}$ , and set $N = N (T) = {(τ T)}^{1 / κ}$ and $C_{γ, α, N, T} = C_{γ, α, N (T), T} = C_{γ, α, T}$ . Then, for $\{ρ_{T}\} \in G_{16}^{P}$ ,

$lim sup_{T \to \infty} Pr (ρ_{T} \notin C_{γ, α, T} | ρ_{T}) \leq α_{1} < α .$
(q): Let $G_{17}^{P} = \{\{ρ_{T}\} : ρ_{T} = exp \{1 / q (T)\}, N^{1 / 3} T^{1 / 3} ≪ q (T) \sim T ≪ \sqrt{N T}\}$ , and set $N = N (T) = {(τ T)}^{1 / κ}$ and $C_{γ, α, N, T} = C_{γ, α, N (T), T} = C_{γ, α, T}$ . Then, for $\{ρ_{T}\} \in G_{17}^{P}$ ,

$lim sup_{T \to \infty} Pr (ρ_{T} \notin C_{γ, α, T} | ρ_{T}) \leq α_{1} < α .$
(r): Let

$G_{18}^{P} = \{\{ρ_{T}\} : ρ_{T} = exp \{\frac{1}{q (T)}\}, N^{1 / 4} T^{1 / 4} ≪ q (T) \sim T, a n d \frac{q (T)}{N^{1 / 3} T^{1 / 3}} = O (1)\},$

and set $N = N (T) = {(τ T)}^{1 / κ}$ and $C_{γ, α, N, T} = C_{γ, α, N (T), T} = C_{γ, α, T}$ . Then, for $\{ρ_{T}\} \in G_{18}^{P}$ ,

$lim sup_{T \to \infty} Pr (ρ_{T} \notin C_{γ, α, T} | ρ_{T}) \leq α_{1} < α .$
(s): Let $G_{19}^{P} = \{\{ρ_{T}\} : ρ_{T} = exp \{1 / q (T)\}, q (T) \sim T, a n d q (T) / (N^{1 / 4} T^{1 / 4}) = O (1)\}$ , and set $N = N (T) = {(τ T)}^{1 / κ}$ and $C_{γ, α, N, T} = C_{γ, α, N (T), T} = C_{γ, α, T}$ . Then, for $\{ρ_{T}\} \in G_{19}^{P}$ ,

$lim sup_{T \to \infty} Pr (ρ_{T} \notin C_{γ, α, T} | ρ_{T}) \leq α_{1} < α .$
(t): Let

$G_{20}^{P} = \{\{ρ_{T}\} : ρ_{T} = exp \{\frac{1}{q (T)}\}, N^{1 / 3} T^{1 / 3} ≪ q (T) ≪ \sqrt{N T}, a n d (q (T) ≪ T)\},$

and set $N = N (T) = {(τ T)}^{1 / κ}$ and $C_{γ, α, N, T} = C_{γ, α, N (T), T} = C_{γ, α, T}$ . Then, for $\{ρ_{T}\} \in G_{20}^{P}$ ,

$lim sup_{T \to \infty} Pr (ρ_{T} \notin C_{γ, α, T} | ρ_{T}) \leq α_{1} < α .$
(u): Let $G_{21}^{P} = \{\{ρ_{T}\} : ρ_{T} = exp \{1 / q (T)\}, q (T) ≪ T, a n d q (T) \sim N^{1 / 3} T^{1 / 3}\}$ , and set $N = N (T) = {(τ T)}^{1 / κ}$ and $C_{γ, α, N, T} = C_{γ, α, N (T), T} = C_{γ, α, T}$ . Then, for $\{ρ_{T}\} \in G_{17}^{P}$ ,

$lim sup_{T \to \infty} Pr (ρ_{T} \notin C_{γ, α, T} | ρ_{T}) \leq α_{1} < α .$
(v): Let

$G_{22}^{P} = \{\{ρ_{T}\} : ρ_{T} = exp \{\frac{1}{q (T)}\}, q (T) \to \infty, q (T) ≪ T, a n d q (T) / N^{1 / 3} T^{1 / 3} \to 0\},$

and set $N = N (T) = {(τ T)}^{1 / κ}$ and $C_{γ, α, N, T} = C_{γ, α, N (T), T} = C_{γ, α, T}$ . Then, for $\{ρ_{T}\} \in G_{22}^{P}$ ,

$lim sup_{T \to \infty} Pr (ρ_{T} \notin C_{γ, α, T} | ρ_{T}) \leq α_{1} < α .$

The proof of Lemma A1 is given in Appendix SB of the technical supplement.

Lemma A2.

Suppose that Assumptions 1, 3, 4, and either 2 or 2* hold, and let

G_{23}^{P} = \{\{ρ_{T}\} : |ρ_{T}| = exp \{- \frac{1}{q (T)}\}, q (T) \geq 0, a n d q (T) = O (1) as T \to \infty\},

Also, let

N, T \to \infty

such that

N^{κ} / T = τ

, for constants

κ \in (0, \infty)

and

τ \in (0, \infty)

, so that we can set

N = N (T) = {(τ T)}^{1 / κ}

and

C_{γ, α, N, T} = C_{γ, α, N (T), T} = C_{γ, α, T}

. Then,

lim sup_{T \to \infty} Pr (ρ_{T} \notin C_{γ, α, T} | ρ_{T}) \leq α_{1} < α,

for

\{ρ_{T}\} \in G_{19}^{P}

.

The proof of Lemma A2 is also given in Appendix SB of the technical supplement.

Proof of Theorem 3.

In the pathwise asymptotics considered here, N grows as a monotonically increasing function of T, so that the asymptotics can be taken to be single-indexed with

T \to \infty

. Hence, we can set

N = {(τ T)}^{1 / κ}

and simplify notation by writing

C_{γ, α, N, T} = C_{γ, α, N (T), T} = C_{γ, α, T}

.

To proceed, note that, by property of a supremum, there exists a sequence

\{ρ_{T} \in (- 1, 1] : T \geq 1\}

such that

lim sup_{T \to \infty} Pr (ρ_{T} \notin C_{γ, α, T} | ρ_{T}) = lim sup_{T \to \infty} sup_{ρ \in (- 1, 1]} Pr (ρ \notin C_{γ, α, T} | ρ)

Thus, for some fixed significance level

α \in (0, 0.5]

, to show that

lim sup_{T \to \infty} sup_{ρ \in (- 1, 1]} Pr (ρ \notin C_{γ, α, T} | ρ) \leq α,

it suffices to show that

lim sup_{T \to \infty} Pr (ρ_{T} \notin C_{γ, α, T} | ρ_{T}) \leq α

for every sequence

\{ρ_{T} \in (- 1, 1] : T \geq 1\}

. To proceed, let

\{G_{j}^{P} : j = 1, 2, \dots, 23\}

be the collections of parameter sequences defined in Lemmas A1 and A2 given above. Moreover, let

\{ρ_{k, T}\} \in G_{s_{k}}^{P}

(for

k = 1, \dots, 23

), i.e.,

\{ρ_{k, T}\}

is a sequence belonging to the collection

G_{s_{k}}^{P}

. Define

T_{k} = f_{k} (T)

(k = 1, \dots, d)

, with

d \leq 23

, where

f_{k} (\cdot) :

N \to N

is an increasing function in its argument, and let

\{ρ_{k, T_{k}}\}

denote a subsequence of

\{ρ_{k, T}\}

. Note that every parameter sequence

ρ_{T} \in

(- 1, 1]

can be represented as

\{ρ_{T}\} = ⋃_{j = 1}^{d} \{ρ_{j, T_{j}}\}

, where

\{ρ_{1, T_{1}}\} \in G_{s_{1}}^{P}, \dots, \{ρ_{d, T_{d}}\} \in G_{s_{d}}^{P}

, with

G_{s_{k}}^{P} \neq G_{s_{ℓ}}^{P}

for

k \neq ℓ

and where

N = ⋃_{k = 1}^{d} \{T_{k} = f_{k} (T) : T \in N\}

(A1)

with

N

denoting the set of natural numbers

\{1, 2, \dots\}

. Moreover, define

υ_{k, T} = {sup}_{m \geq T} Pr (ρ_{k, m} \notin C_{γ, α, m} | ρ_{k, m} \in G_{s_{k}}^{P})

and

{\bar{p}}_{k} = lim {sup}_{T \to \infty} Pr (ρ_{k, T} \notin C_{γ, α, T} | ρ_{k, T} \in G_{s_{k}}^{P})

. It is clear from the definition of

υ_{k, T}

and

{\bar{p}}_{k}

that

{lim}_{T \to \infty} υ_{k, T} = {\bar{p}}_{k}

for each

k \in \{1, 2, \dots, d\}

; or, more formally, for any

ε > 0

, there exists positive integer

L_{k}

such that, for all

T \geq L_{k}

,

|υ_{k, T} - {\bar{p}}_{k}| < ε

, from which it follows, using the results of Lemma A1, that, for any

ε > 0

and for each

k \in \{1, 2, \dots, d\}

, there exists a positive integer

L_{k}

such that, for all

T \geq L_{k}

,

υ_{k, T} < {\bar{p}}_{k} + ε \leq α + ε

. Now, for any

k \in \{1, \dots, d\}

and for any positive integer

T \geq L_{k}

, we have, by Lemma SD-33 given in Appendix SD of the technical supplement to this paper, that

T_{k} = f_{k} (T) \geq L_{k}

, so that

|υ_{k, T_{k}} - {\bar{p}}_{k}| < ε

, for any subsequence

\{υ_{k, T_{k}}\}

of

\{υ_{k, T}\}

, from which we further deduce that

υ_{k, T_{k}} = sup_{m \geq T_{k}} Pr (ρ_{k, m} \notin C_{γ, α, m} | ρ_{k, m} \in G_{s_{k}}^{P}) < {\bar{p}}_{k} + ε \leq α + ε .

Next, let

L^{\max} = max \{f_{1} (L_{1}), \dots, f_{d} (L_{d})\}

. Consider any positive integer

T \geq L^{\max}

; then, (A1) implies that

T = f_{k} (T^{*})

for some

k = 1, \dots, d

and for some

T^{*} \in N

. By the fact that

f_{k} (\cdot)

is an increasing function of its argument, we have that

T = f_{k} (T^{*}) \geq L^{\max} \geq f_{k} (L_{k}) \geq L_{k}

, from which it follows that for every positive integer

T \geq L^{\max}

\begin{matrix} sup_{m \geq T} Pr (ρ_{m} \notin C_{γ, α, m} | ρ_{m}) & = & sup_{m \geq f_{k} (T^{*})} Pr (ρ_{m} \notin C_{γ, α, m} | ρ_{m}) \\ \leq & sup_{m \geq L^{\max}} Pr (ρ_{m} \notin C_{γ, α, m} | ρ_{m}) < α + ε \end{matrix}

Hence, for any sequence

\{ρ_{T}\} \in (- 1, 1]

,

lim sup_{T \to \infty} Pr (ρ_{T} \notin C_{γ, α, T} | ρ_{T}) = inf_{T \geq 1} sup_{m \geq T} Pr (ρ_{m} \notin C_{γ, α, m} | ρ_{m}) < α + ε .

Since

ε

is arbitrary, we deduce that

lim sup_{T \to \infty} Pr (ρ_{T} \notin C_{γ, α, T} | ρ_{T}) \leq α

for any sequence

\{ρ_{T}\} \in (- 1, 1]

, which gives the desired conclusion. □

References

Ahn, S. C., and P. Schmidt. 1995. Efficient Estimation of Models for Dynamic Panel Data. Journal of Econometrics 68: 29–51. [Google Scholar] [CrossRef]
Alvarez, J., and M. Arellano. 2003. The Time Series and Cross-Section Asymptotics of Dynamic Panel Data Estimators. Econometrica 71: 1121–59. [Google Scholar] [CrossRef]
Anderson, T. W., and C. Hsiao. 1981. Estimation of Dynamic Models with Error Components. Journal of American Statistical Association 76: 598–606. [Google Scholar] [CrossRef]
Anderson, T. W., and C. Hsiao. 1982. Formulation and Estimation of Dynamic Models Using Panel Data. Journal of Econometrics 18: 47–82. [Google Scholar] [CrossRef]
Andrews, D. W. K. 1993. Exactly Median-Unbiased Estimation of First-Order Autoregressive/Unit Root Models. Econometrica 61: 139–65. [Google Scholar] [CrossRef]
Andrews, D. W. K., and P. Guggenberger. 2009. Validity of Subsampling and ‘Plug-in Asymptotic’ Inference for Parameters Defined by Moment Inequalities. Econometric Theory 25: 669–709. [Google Scholar] [CrossRef]
Andrews, D. W. K., X. Cheng, and P. Guggenberger. 2011. Generic Results for Establishing the Asymptotic Size of Confidence Sets and Tests. Cowles Foundation Discussion Paper No. 1813. New Haven: Yale University. [Google Scholar]
Arellano, M., and S. Bond. 1991. Some Tests of Specification for Panel Data: Monte Carlo Evidence and an Application to Employment Equations. Review of Economic Studies 58: 277–97. [Google Scholar] [CrossRef]
Arellano, M., and O. Bover. 1995. Another Look at the Instrumental Variable Estimation of Error-components Models. Journal of Econometrics 68: 29–51. [Google Scholar] [CrossRef]
Blundell, R., and S. Bond. 1998. Initial Conditions and Moment Restrictions in Dynamic Panel Data Models. Journal of Econometrics 87: 115–43. [Google Scholar] [CrossRef]
Bun, M. J. G., and F. Kleibergen. 2014. Identification and Inference in Moments Based Analysis of Linear Dynamic Panel Data Models. UvA Econometrics Discussion Paper. Amsterdam: University of Amsterdam-Econometrics Discussion. [Google Scholar]
Chang, Y. 2002. Nonlinear IV Unit Root Tests in Panel with Cross-sectional Dependency. Journal of Econometrics 110: 261–92. [Google Scholar] [CrossRef]
Chang, Y. 2004. Bootstrap Unit Root Tests in Panels with Cross-sectional Dependency. Journal of Econometrics 120: 263–93. [Google Scholar] [CrossRef]
Chao, J., and P. C. B. Phillips. 2019. Technical Supplement to: Uniform Inference in Panel Autoregression. Working paper. College Park: University of Maryland. [Google Scholar]
Durbin, J. 1960. Estimating of Parameters in Time Series Regression Models. Journal of the Royal Statistical Society, Series B 22: 139–53. [Google Scholar]
Gorodnichenko, Y., A. Mikusheva, and S. Ng. 2012. Estimators for Persistent and Possibly Nonstationary Data with Classical Properties. Econometric Theory 28: 1003–36. [Google Scholar] [CrossRef]
Hahn, J., and G. Kuersteiner. 2002. Asymptotically Unbiased Inference for a Dynamic Panel Model with Fixed Effects When Both N and T are Large. Econometrica 70: 1639–57. [Google Scholar] [CrossRef]
Han, C., and P. C. B. Phillips. 2010. GMM Estimation for Dynamic Panels with Fixed Effects and Strong Instruments at Unity. Econometric Theory 26: 119–51. [Google Scholar] [CrossRef]
Han, C., P. C. B. Phillips, and D. Sul. 2011. Uniform Asymptotic Normality in Stationary and Unit Root Autoregression. Econometric Theory 27: 1117–51. [Google Scholar] [CrossRef]
Han, C., P. C. B. Phillips, and D. Sul. 2014. X-Differencing and Dynamic Panel Model Estimation. Econometric Theory 30: 201–51. [Google Scholar] [CrossRef]
Hansen, B. E. 1999. The Grid Bootstrap and the Autoregressive Model. Review of Economics and Statistics 81: 594–607. [Google Scholar] [CrossRef]
Kiviet, J. F. 1995. On Bias, Inconsistency, and Effciency of Various Estimators in Dynamic Panel Data Models. Journal of Econometrics 68: 53–78. [Google Scholar] [CrossRef]
Lehmann, E. L. 1999. Elements of Large Sample Theory. New York: Springer. [Google Scholar]
Lepski, O. V. 1999. How to Improve the Accuracy of Estimation. Mathematical Methods in Statistics 8: 441–86. [Google Scholar]
Mikusheva, A. 2007. Uniform Inference in Autoregressive Models. Econometrica 75: 1411–52. [Google Scholar] [CrossRef]
Mikusheva, A. 2012. One-dimensional Inference in Autoregressive Models with the Potential Presence of a Unit Root. Econometrica 80: 173–212. [Google Scholar]
Moon, H. R., and P. C. B. Phillips. 2004. GMM Estimation of Autoregressive Roots Near Unity with Panel Data. Econometrica 72: 467–522. [Google Scholar] [CrossRef]
Moon, H. R., B. Perron, and P. C. B. Phillips. 2014. Point Optimal Panel Unit Root Tests with Serially Correlated Data. Econometrics Journal 17: 338–72. [Google Scholar] [CrossRef]
Moon, H. R., B. Perron, and P. C. B. Phillips. 2015. Incidental Parameters and Dynamic Panel Modeling. In The Oxford Handbook of Panel Data. Edited by B. Baltagi. New York: Oxford University Press. [Google Scholar]
Nickell, S. 1981. Biases in Dynamic Models with Fixed Effects. Econometrica 49: 1417–26. [Google Scholar] [CrossRef]
Pesaran, M. H. 2006. Estimation and Inference in Large Heterogeneous Panels with a Multifactor Error Structure. Econometrica 74: 967–1012. [Google Scholar] [CrossRef]
Pesaran, M. H., and E. Tosetti. 2011. Large Panels with Common Factors and Spatial Correlation. Journal of Econometrics 161: 182–202. [Google Scholar] [CrossRef]
Phillips, P. C. B. 1987a. Time Series Regression with a Unit Root. Econometrica 55: 277–301. [Google Scholar] [CrossRef]
Phillips, P. C. B. 1987b. Towards a Unified Asymptotic Theory of Autoregression. Biometrika 74: 535–47. [Google Scholar] [CrossRef]
Phillips, P. C. B. 2014. On Confidence Intervals for Autoregressive Roots and Predictive Regression. Econometrica 82: 1177–95. [Google Scholar] [CrossRef]
Phillips, P. C. B. 2018. Dynamic Panel Anderson–Hsiao Estimation With Roots Near Unity. Econometric Theory 34: 253–276. [Google Scholar] [CrossRef]
Phillips, P. C. B., and H. R. Moon. 1999. Linear Regression Limit Theory for Nonstationary Panel Data. Econometrica 67: 1057–111. [Google Scholar] [CrossRef]
Phillips, P. C. B., and D. Sul. 2003. Dynamic Panel Estimation and Homogeneity Testing under Cross Sectional Dependence. Econometrics Journal 6: 217–59. [Google Scholar] [CrossRef]
Stock, J. 1999. Confidence Intervals for the Largest Autoregressive Root in US Macroeconomic Time Series. Journal of Monetary Economics 28: 435–59. [Google Scholar] [CrossRef]

1.	We do not consider in this paper issues related to incidental trends, cross section dependence, and slope parameter heterogeneity discussed earlier. While these complications are important and empirically relevant, they are beyond the scope of the current paper and considering them here would divert from the main point of this paper which concerns the development of uniform inference procedures.
2.	For readers interested in the asymptotic properties of the Anderson–Hsiao IV estimator, we would like to refer them to Theorem SA-1 in the Technical Supplement to this paper. There, we present a very extensive set of results on the large sample behavior of this estimator under various parameter sequences both near and far away from unity. In addition, the proof of Theorem SA-1 is provided in Appendix SB of the Technical Supplement.
3.	Other approaches for achieving uniform inference in estimation have been proposed recently in the time series literature by (Han et al. 2011) using partial aggregation methods and by (Gorodnichenko et al. 2012) using quasi-differencing. In the unit root and very near unit root cases, extending these approaches to the panel data setting leads to confidence intervals whose width shrinks at a slower rate than the optimal $N^{- 1 / 2} T^{- 1}$ rate obtained here. (Han et al. 2014) developed a panel estimator using X-differencing which has good bias properties and limit theory but has different limit theory in unit root and stationary cases, complicating uniform inference.
4.	The reason we consider indexed parameter $ρ_{T}$ which depends on T only, and not on both N and T, is because our main results are obtained under a general pathwise asymptotic scheme where N can grow as an arbitrary positive real-valued power of T. In such a framework, the asymptotics are effectively single-indexed. Hence, it suffices to consider parameter sequences that depend only on T.
5.	The proof of Lemma SC-12 is also given in Appendix SC of the technical supplement.
6.	We use the notation ${\hat{ρ}}_{IVD}$ to denote the Anderson–Hsiao IV estimator because it is a procedure where IV estimation is performed on a first-differenced equation. Later, we use ${\hat{ρ}}_{IVL}$ to denote the IV estimator introduced by (Arellano and Bover 1995) since, in that procedure, IV is performed on the panel autoregression in levels.
7.	Note that we use the notation $Pr (\cdot \| ρ)$ instead of perhaps the more familiar notation $P_{ρ} (\cdot)$ to denote a probability measure indexed by the parameter $ρ$ because, in this paper, we will often consider somewhat complicated local-to-unity parameters and subsequences of such parameters, which are less conveniently expressed in terms of subscripts.
8.	A recent paper by (Bun and Kleibergen 2014) also considers, amongst other things, combining elements of the approach of (Anderson and Hsiao 1981, 1982; Arellano and Bond 1991), which uses lagged levels of $y_{i t}$ as instruments for equations in first differences, with the approach by (Arellano and Bover 1995; Blundell and Bond 1998) which uses lagged differences of $y_{i t}$ as instruments for equations in levels. The focus of the (Bun and Kleibergen 2014) paper differs substantially from that of the present paper. In particular, they consider test procedures which attain the maximal attainable power curve under worst case setting of the variance of the initial conditions, whereas our procedure uses pretest based information to aggressively increase the power of our inferential procedure in certain regions of the parameter space. Moreover, unlike our paper, they do not provide results on confidence procedures whose asymptotic coverage probability is explicitly shown to be at least that of the nominal level uniformly over the parameter space; and their analysis is conducted within a fixed T framework.
9.	The statement of Lemma A1 is given in the Appendix A of the paper. Its proof is lengthy and it is therefore placed in the technical supplement.
10.	It might initially seem strange in Table 6 and Table 8 that in the cases where $ρ = 1$ and $N = 100$ , the number of empty intervals for $C_{0.05}^{M}$ actually increased as the sample size in the time dimension increased from $T = 50$ to $T = 100$ . However, there is an intuitive explanation for this result. As noted earlier, in the unit root case, the rate of concentration of the width of the $C_{0.05}^{M}$ interval is $O_{p} (T^{- 1 / 2})$ , so that intervals obtained under this procedure are wider in the $T = 50$ case than in the $T = 100$ case, leading to a higher chance of a non-null intersection with the parameter space.
11.	The reason for using the notation $G_{s_{k}}^{M}$ , as opposed to $G_{k}^{M}$ , is so that we can refer to a particular collection of sequences amongst $\{G_{j}^{M} : j = 1, 2, \dots, 5\}$ without $G_{s_{1}}^{M}$ necessarily being $G_{1}^{M}$ , for example.

Table 1. Coverage Probabilities (nominal level = 0.95).

$N = 100$ , $w_{i 0} = 0$
$ρ$	$T$	${C I}_{POLS}$	${C I}_{IVD}$	${C I}_{M}$	${C I}_{PCI 1}$	${C I}_{PCI 2}$	${C I}_{PCI 3}$	${C I}_{PCI 4}$
1.00	50	0.9490	0.1229	0.9430	1.0000	1.0000	0.9999	0.9996
1.00	100	0.9518	0.1251	0.9411	1.0000	1.0000	0.9999	0.9999
0.99	50	0.9476	0.3874	0.9385	0.9957	0.9934	0.9891	0.9828
0.99	100	0.9443	0.6239	0.9448	0.9918	0.9850	0.9872	0.9752
0.95	50	0.7995	0.8046	0.9369	0.9839	0.9678	0.9839	0.9678
0.95	100	0.6816	0.8911	0.9445	0.9874	0.9749	0.9874	0.9743
0.90	50	0.2384	0.8738	0.9376	0.9833	0.9649	0.9758	0.9491
0.90	100	0.0507	0.9223	0.9465	0.9715	0.9489	0.9715	0.9476
0.80	50	0.0002	0.9055	0.9378	0.9677	0.9386	0.9677	0.9386
0.80	100	0.0000	0.9254	0.9421	0.9705	0.9432	0.9705	0.9432
0.60	50	0.0000	0.9162	0.9351	0.9650	0.9361	0.9650	0.9361
0.60	100	0.0000	0.9335	0.9425	0.9700	0.9435	0.9700	0.9435

Results based on 10,000 simulations.

Table 2. Coverage Probabilities (nominal level = 0.95).

$N = 100$ , $w_{i 0} = 2$
$ρ$	$T$	${C I}_{POLS}$	${C I}_{IVD}$	${C I}_{M}$	${C I}_{PCI 1}$	${C I}_{PCI 2}$	${C I}_{PCI 3}$	${C I}_{PCI 4}$
1.00	50	0.9490	0.1345	0.9311	1.0000	1.0000	0.9998	0.9996
1.00	100	0.9518	0.1285	0.9339	1.0000	1.0000	0.9999	0.9999
0.99	50	0.9493	0.4537	0.9226	0.9947	0.9929	0.9878	0.9792
0.99	100	0.9449	0.6648	0.9370	0.9910	0.9859	0.9856	0.9750
0.95	50	0.8056	0.8720	0.9164	0.9752	0.9595	0.9748	0.9578
0.95	100	0.6864	0.9172	0.9326	0.9842	0.9706	0.9839	0.9682
0.90	50	0.2498	0.9227	0.9150	0.9715	0.9490	0.9624	0.9322
0.90	100	0.0546	0.9376	0.9359	0.9632	0.9397	0.9634	0.9372
0.80	50	0.0002	0.9353	0.9198	0.9567	0.9213	0.9567	0.9213
0.80	100	0.0000	0.9393	0.9313	0.9644	0.9331	0.9644	0.9331
0.60	50	0.0000	0.9357	0.9225	0.9563	0.9250	0.9563	0.9250
0.60	100	0.0000	0.9414	0.9358	0.9657	0.9376	0.9657	0.9376

Results based on 10,000 simulations.

Table 3. Coverage Probabilities (nominal level = 0.95).

$N = 200$ , $w_{i 0} = 0$
$ρ$	$T$	${C I}_{POLS}$	${C I}_{IVD}$	${C I}_{M}$	${C I}_{PCI 1}$	${C I}_{PCI 2}$	${C I}_{PCI 3}$	${C I}_{PCI 4}$
1.00	100	0.9494	0.0921	0.9455	1.0000	1.0000	0.9998	0.9997
1.00	200	0.9458	0.0879	0.9499	0.9999	0.9999	0.9998	0.9998
0.99	100	0.9468	0.6346	0.9482	0.9875	0.9786	0.9856	0.9742
0.99	200	0.9409	0.8101	0.9483	0.9867	0.9759	0.9863	0.9748
0.95	100	0.4377	0.8949	0.9436	0.9844	0.9696	0.9782	0.9558
0.95	200	0.1796	0.9243	0.9444	0.9736	0.9467	0.9736	0.9461
0.90	100	0.0010	0.9186	0.9451	0.9705	0.9462	0.9705	0.9462
0.90	200	0.0000	0.9349	0.9475	0.9742	0.9481	0.9742	0.9481
0.80	100	0.0000	0.9320	0.9447	0.9740	0.9457	0.9740	0.9457
0.80	200	0.0000	0.9353	0.9422	0.9715	0.9433	0.9715	0.9433
0.60	100	0.0000	0.9368	0.9452	0.9707	0.9466	0.9707	0.9466
0.60	200	0.0000	0.9439	0.9482	0.9732	0.9490	0.9732	0.9490

Results based on 10,000 simulations.

Table 4. Coverage Probabilities (nominal level = 0.95).

$N = 200$ , $w_{i 0} = 2$
$ρ$	$T$	${C I}_{POLS}$	${C I}_{IVD}$	${C I}_{M}$	${C I}_{PCI 1}$	${C I}_{PCI 2}$	${C I}_{PCI 3}$	${C I}_{PCI 4}$
1.00	100	0.9494	0.0958	0.9370	1.0000	1.0000	1.0000	0.9996
1.00	200	0.9458	0.0911	0.9468	0.9999	0.9999	0.9998	0.9998
0.99	100	0.9471	0.6731	0.9398	0.9850	0.9773	0.9824	0.9712
0.99	200	0.9421	0.8297	0.9441	0.9857	0.9755	0.9847	0.9728
0.95	100	0.4475	0.9167	0.9327	0.9802	0.9614	0.9725	0.9473
0.95	200	0.1850	0.9325	0.9401	0.9702	0.9423	0.9702	0.9413
0.90	100	0.0009	0.9351	0.9341	0.9622	0.9354	0.9622	0.9354
0.90	200	0.0000	0.9420	0.9411	0.9703	0.9431	0.9703	0.9431
0.80	100	0.0000	0.9436	0.9364	0.9683	0.9372	0.9683	0.9372
0.80	200	0.0000	0.9433	0.9398	0.9678	0.9407	0.9678	0.9407
0.60	100	0.0000	0.9419	0.9374	0.9669	0.9392	0.9669	0.9392
0.60	200	0.0000	0.9469	0.9447	0.9710	0.9456	0.9710	0.9456

Results based on 10,000 simulations.

Table 5. Average Width of Confidence Intervals.

$N = 100$ , $w_{i 0} = 0$
$ρ$	$T$	${C I}_{POLS}$	${C I}_{IVD}$	${C I}_{M}$	${C I}_{PCI 1}$	${C I}_{PCI 2}$	${C I}_{PCI 3}$	${C I}_{PCI 4}$
1.00	50	0.0059	0.0493	0.9810	0.0133	0.0168	0.0161	0.0204
1.00	100	0.0030	0.0361	0.7866	0.0070	0.0088	0.0090	0.0114
0.99	50	0.0126	0.0576	0.2362	0.1001	0.1233	0.1235	0.1456
0.99	100	0.0073	0.0452	0.0947	0.0898	0.1105	0.0866	0.1029
0.95	50	0.0184	0.0902	0.1215	0.1385	0.1504	0.1357	0.1367
0.95	100	0.0123	0.0721	0.0838	0.0963	0.0967	0.0944	0.0889
0.90	50	0.0234	0.1063	0.1273	0.1446	0.1311	0.1442	0.1285
0.90	100	0.0161	0.0764	0.0838	0.0958	0.0842	0.0958	0.0842
0.80	50	0.0297	0.1052	0.1171	0.1339	0.1177	0.1339	0.1177
0.80	100	0.0207	0.0744	0.0784	0.0897	0.0788	0.0897	0.0788
0.60	50	0.0369	0.0992	0.1057	0.1209	0.1062	0.1209	0.1062
0.60	100	0.0259	0.0701	0.0724	0.0828	0.0727	0.0828	0.0727

Results based on 10,000 simulations.

Table 6. Number of Empty Intervals (out of 10,000).

$N = 100$ , $w_{i 0} = 0$
$ρ$	$T$	${C I}_{POLS}$	${C I}_{IVD}$	${C I}_{M}$	${C I}_{PCI 1}$	${C I}_{PCI 2}$	${C I}_{PCI 3}$	${C I}_{PCI 4}$
1.00	50	185	4400	215	0	0	0	0
1.00	100	169	4339	263	0	0	0	0
0.99	50	0	2768	243	0	0	0	0
0.99	100	0	1347	164	0	0	0	0
0.95	50	0	62	9	0	0	0	0
0.95	100	0	0	0	0	0	0	0

There are no empty intervals for any of the procedures in the cases

ρ = 0.9, 0.8,

and

0.6

.

Table 7. Average Width of Confidence Intervals.

$N = 100$ , $w_{i 0} = 2$
$ρ$	$T$	${C I}_{POLS}$	${C I}_{IVD}$	${C I}_{M}$	${C I}_{PCI 1}$	${C I}_{PCI 2}$	${C I}_{PCI 3}$	${C I}_{PCI 4}$
1.00	50	0.0059	0.0491	0.9114	0.0132	0.0166	0.0152	0.0195
1.00	100	0.0030	0.0360	0.7521	0.0069	0.0087	0.0091	0.0112
0.99	50	0.0126	0.0576	0.1821	0.1005	0.1242	0.1172	0.1402
0.99	100	0.0073	0.0457	0.0844	0.0888	0.1096	0.0831	0.0999
0.95	50	0.0182	0.0925	0.1025	0.1260	0.1409	0.1182	0.1220
0.95	100	0.0122	0.0728	0.0764	0.0896	0.0914	0.0867	0.0823
0.90	50	0.0230	0.1071	0.1054	0.1210	0.1109	0.1200	0.1069
0.90	100	0.0160	0.0764	0.0757	0.0866	0.0761	0.0866	0.0761
0.80	50	0.0292	0.1052	0.0995	0.1137	0.0999	0.1137	0.0999
0.80	100	0.0206	0.0744	0.0722	0.0826	0.0726	0.0826	0.0726
0.60	50	0.0366	0.0992	0.0954	0.1091	0.0958	0.1091	0.0958
0.60	100	0.0258	0.0701	0.0688	0.0786	0.0691	0.0786	0.0691

Results based on 10,000 simulations.

Table 8. Number of Empty Intervals (out of 10,000).

$N = 100$ , $w_{i 0} = 2$
$ρ$	$T$	${C I}_{POLS}$	${C I}_{IVD}$	${C I}_{M}$	${C I}_{PCI 1}$	${C I}_{PCI 2}$	${C I}_{PCI 3}$	${C I}_{PCI 4}$
1.00	50	185	4360	270	0	0	0	0
1.00	100	169	4363	295	0	0	0	0
0.99	50	0	2411	275	0	0	0	0
0.99	100	0	1144	184	0	0	0	0
0.95	50	0	12	6	0	0	0	0
0.95	100	0	0	0	0	0	0	0

There are no empty intervals for any of the procedures in the cases

ρ = 0.9, 0.8,

and

0.6

.

Table 9. Average Width of Confidence Intervals.

$N = 200$ , $w_{i 0} = 0$
$ρ$	$T$	${C I}_{POLS}$	${C I}_{IVD}$	${C I}_{M}$	${C I}_{PCI 1}$	${C I}_{PCI 2}$	${C I}_{PCI 3}$	${C I}_{PCI 4}$
1.00	100	0.0021	0.0253	0.7838	0.0048	0.0060	0.0066	0.0083
1.00	200	0.0010	0.0178	0.6377	0.0027	0.0033	0.0037	0.0047
0.99	100	0.0051	0.0340	0.0696	0.0662	0.0796	0.0657	0.0751
0.99	200	0.0032	0.0272	0.0384	0.0454	0.0544	0.0429	0.0487
0.95	100	0.0087	0.0540	0.0635	0.0722	0.0654	0.0720	0.0641
0.95	200	0.0060	0.0387	0.0421	0.0481	0.0423	0.0481	0.0423
0.90	100	0.0114	0.0540	0.0592	0.0677	0.0595	0.0677	0.0595
0.90	200	0.0080	0.0382	0.0400	0.0457	0.0402	0.0457	0.0402
0.80	100	0.0147	0.0526	0.0554	0.0634	0.0557	0.0634	0.0557
0.80	200	0.0103	0.0372	0.0382	0.0437	0.0384	0.0437	0.0384
0.60	100	0.0183	0.0496	0.0512	0.0585	0.0514	0.0585	0.0514
0.60	200	0.0129	0.0351	0.0356	0.0407	0.0358	0.0407	0.0358

Results based on 10,000 simulations.

Table 10. Number of Empty Intervals (out of 10,000).

$N = 200$ , $w_{i 0} = 0$
$ρ$	$T$	${C I}_{POLS}$	${C I}_{IVD}$	${C I}_{M}$	${C I}_{PCI 1}$	${C I}_{PCI 2}$	${C I}_{PCI 3}$	${C I}_{PCI 4}$
1.00	100	208	4538	256	0	0	0	0
1.00	200	241	4685	247	0	0	0	0
0.99	100	0	1108	114	0	0	0	0
0.99	200	0	252	51	0	0	0	0
0.95	100	0	0	0	0	0	0	0
0.95	200	0	0	0	0	0	0	0

There are no empty intervals for any of the procedures in the cases

ρ = 0.9, 0.8,

and

0.6

.

Table 11. Average Width of Confidence Intervals.

$N = 200$ , $w_{i 0} = 2$
$ρ$	$T$	${C I}_{POLS}$	${C I}_{IVD}$	${C I}_{M}$	${C I}_{PCI 1}$	${C I}_{PCI 2}$	${C I}_{PCI 3}$	${C I}_{PCI 4}$
1.00	100	0.0021	0.0252	0.7507	0.0048	0.0060	0.0064	0.0081
1.00	200	0.0010	0.0179	0.6201	0.0027	0.0033	0.0037	0.0047
0.99	100	0.0051	0.0344	0.0623	0.0647	0.0783	0.0622	0.0720
0.99	200	0.0031	0.0274	0.0365	0.0448	0.0539	0.0417	0.0477
0.95	100	0.0086	0.0542	0.0571	0.0653	0.0594	0.0650	0.0578
0.95	200	0.0060	0.0387	0.0398	0.0455	0.0400	0.0455	0.0400
0.90	100	0.0113	0.0540	0.0535	0.0612	0.0538	0.0612	0.0538
0.90	200	0.0079	0.0382	0.0380	0.0435	0.0382	0.0435	0.0382
0.80	100	0.0145	0.0526	0.0511	0.0584	0.0513	0.0584	0.0513
0.80	200	0.0103	0.0372	0.0366	0.0419	0.0368	0.0419	0.0368
0.60	100	0.0182	0.0496	0.0486	0.0556	0.0488	0.0556	0.0488
0.60	200	0.0129	0.0351	0.0347	0.0397	0.0349	0.0397	0.0349

Results based on 10,000 simulations.

Table 12. Number of Empty Intervals (out of 10,000).

$N = 200$ , $w_{i 0} = 2$
$ρ$	$T$	${C I}_{POLS}$	${C I}_{IVD}$	${C I}_{M}$	${C I}_{PCI 1}$	${C I}_{PCI 2}$	${C I}_{PCI 3}$	${C I}_{PCI 4}$
1.00	100	208	4552	280	0	0	0	0
1.00	200	241	4608	261	0	0	0	0
0.99	100	0	906	127	0	0	0	0
0.99	200	0	209	52	0	0	0	0
0.95	100	0	0	0	0	0	0	0
0.95	200	0	0	0	0	0	0	0

There are no empty intervals for any of the procedures in the cases

ρ = 0.9, 0.8,

and

0.6

.

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chao, J.C.; Phillips, P.C.B. Uniform Inference in Panel Autoregression. Econometrics 2019, 7, 45. https://doi.org/10.3390/econometrics7040045

AMA Style

Chao JC, Phillips PCB. Uniform Inference in Panel Autoregression. Econometrics. 2019; 7(4):45. https://doi.org/10.3390/econometrics7040045

Chicago/Turabian Style

Chao, John C., and Peter C. B. Phillips. 2019. "Uniform Inference in Panel Autoregression" Econometrics 7, no. 4: 45. https://doi.org/10.3390/econometrics7040045

APA Style

Chao, J. C., & Phillips, P. C. B. (2019). Uniform Inference in Panel Autoregression. Econometrics, 7(4), 45. https://doi.org/10.3390/econometrics7040045

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Uniform Inference in Panel Autoregression

Abstract

1. Introduction

2. Model and Assumptions

3. Uniform Asymptotic Confidence Intervals

3.1. Confidence Intervals Based on the Anderson–Hsiao IV Procedure

3.2. A Pretest-Based Confidence Procedure

4. Monte Carlo Study

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

Appendix A. Proofs of the Main Results

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI