Uniform Inference in Panel Autoregression

This paper considers estimation and inference concerning the autoregressive coefficient (ρ) in a panel autoregression for which the degree of persistence in the time dimension is unknown. The main objective is to construct confidence intervals for ρ that are asymptotically valid, having asymptotic coverage probability at least that of the nominal level uniformly over the parameter space. It is shown that a properly normalized statistic based on the Anderson-Hsiao IV procedure, which we call the M statistic, is uniformly convergent and can be inverted to obtain asymptotically valid interval estimates. In the unit root case confidence intervals based on this procedure are unsatisfactorily wide and uninformative. To sharpen the intervals a new procedure is developed using information from unit root pretests to select alternative confidence intervals. Two sequential tests are used to assess how close ρ is to unity and to correspondingly tailor intervals near the unit root region. When ρ is close to unity, the width of these intervals shrinks to zero at a faster rate than that of the confidence interval based on the M statistic. Only when both tests reject the unit root hypothesis does the construction revert to the M statistic intervals, whose width has the optimal N^{-1/2}T^{-1/2} rate of shrinkage when the underlying process is stable. The asymptotic properties of this pretest-based procedure show that it produces confidence intervals with at least the prescribed coverage probability in large samples. Simulations confirm that the proposed interval estimation methods perform well in finite samples and are easy to implement in practice. A supplement to the paper provides an extensive set of new results on the asymptotic behavior of panel IV estimators in weak instrument settings.


Introduction
Due to the many challenges that arise in estimating and conducting statistical inference for dynamic panel data models, a vast literature has emerged studying these models over the past three decades. Much has been learnt about the large sample properties and finite sample performance such models when more persistent behavior, such as unit root or near unit root behavior, is present. (Phillips and Moon 1999) provided methods that opened up the rigorous development of asymptotics in such models for both stationary and nonstationary cases and with multidimensional joint and sequential limits. Many subsequent contributions to this nonstationary panel literature have considered more complex regressions, analyzing the effects of incidental trends, serial dependence and cross section dependence; e.g., (Chang 2002(Chang , 2004Phillips and Sul 2003;Moon and Phillips 2004;Moon et al. 2014Moon et al. , 2015Pesaran 2006;Pesaran and Tosetti 2011).
While this literature has greatly enhanced our understanding of the panel data sampling behavior of point estimators and of associated test statistics, such as the Studentized t statistic or the Wald statistic, what have not been studied are confidence interval procedures which areasymptotically valid in the sense that asymptotic coverage probabilities are at least that of the nominal level uniformly over the parameter space. The development of theoretically justified confidence intervals is especially important in cases where the empirical researcher may not have good prior information about the degree of persistence in the data, since in such situations interval estimates can serve as indispensible supplements to point estimates by providing additional information about sampling uncertainty and about the range of possible values of the autoregressive parameter ρ that are consistent with the observed data. Moreover, we know from the unit root time series literature that constructing an asymptotically valid confidence interval for the autoregressive parameter of an AR (1) process is a challenging task when the parameter space is taken to be large enough to include both the stable and the unit root cases. This is because the Studentized statistic based on OLS estimation is not uniformly convergent in this case, so that an asymptotically correct confidence interval cannot be constructed by inverting the Studentized statistic in the usual way. To address this problem in the time series literature, (Stock 1999) proposed a confidence procedure based on local-to-unity asymptotics, while simulation and bootstrap type methods have been introduced by (Andrews 1993;Hansen 1999).
Recent results by (Mikusheva 2007(Mikusheva , 2012 and by (Phillips 2014) have shown that the methods of (Andrews 1993;Hansen 1999), as well as a recentered version of Stock's method, all give the correct asymptotic coverage probability uniformly over the parameter space. Extending these procedures to the panel data setting does not seem to be straightforward, and panel data versions of these methods are currently unavailable.
To address this need, the present paper proposes simple, asymptotically correct confidence procedures for the autoregressive coefficient of a panel autoregression 1 . We take as our starting point the estimating equation of the (Anderson and Hsiao 1981) IV procedure. Although much has been written about the Anderson-Hsiao IV estimator having a weak instrument problem when ρ is unity or very nearly unity, it should be noted that this estimator is still consistent even when ρ = 1 and that the weak instrument problem primarily manifests itself in the form of the asymptotic distribution 1 We do not consider in this paper issues related to incidental trends, cross section dependence, and slope parameter heterogeneity discussed earlier. While these complications are important and empirically relevant, they are beyond the scope of the current paper and considering them here would divert from the main point of this paper which concerns the development of uniform inference procedures. having greater dispersion 2 . In fact, the Anderson-Hsiao estimating equation is very well-centered and is an unbiased estimating equation in the sense of (Durbin 1960), a property that is of particular importance in constructing asymptotically valid confidence intervals. Exploiting this unbiasedness property, we then show that a properly normalized statistic based on this estimating equation is uniformly convergent over the parameter space Θ ρ = (−1, 1]. This statistic, which we refer to as the M statistic since it is based on the (empirical) IV moment function, can be easily and analytically inverted to obtain an asymptotically correct confidence interval. However, because of the weak instrument problem, when the true ρ is unity or very near unity, confidence intervals obtained by inverting this M statistic may be less informative in the sense that they may be relatively wide in finite samples and, asymptotically, their width shrinks toward zero at the slower rate of T −1/2 even when both the cross section (N) and the time series (T) sample sizes approach infinity. A similar drawback applies to the GMM procedure of (Han and Phillips 2010), which achieves uniforminference with shrinkage rate (NT) −1/2 over the full domain Θ ρ .
To obtain more informative interval estimates, we introduce a new confidence procedure which uses information from two different unit root tests, with different power properties, to assess the proximity of the true autoregressive parameter from the exact unit root null hypothesis H 0 : ρ = 1. More precisely, we infer that the true parameter value is unity or very close to unity if the more powerful of the two unit root tests fails to reject H 0 , and we use, in this case, an interval that is localized at ρ = 1, with width that shrinks at a faster N −1/2 T −1 rate 3 . Second, if the more powerful test rejects H 0 but the less powerful test fails to reject, we use another interval that is still localized at ρ = 1 but with greater width which shrinks at the rate N −1/2 T −1/2 , a rate that is still faster than that of the width of the confidence interval based on the M statistic in the vicinity of ρ = 1. Finally, if both tests reject H 0 , then we conclude that the true parameter value is far enough away from unity that we can use the confidence interval based on the M statistic, whose width shrinks at the optimal N −1/2 T −1/2 rate in the stable region of the parameter space. We show that the asymptotic size of this pretest based procedure can be uniformly controlled, so that this procedure is asymptotically valid, albeit slightly conservative when the underlying process is stable. The degree of conservatism under our procedure is also controllable and can be kept small by carefully controlling the probability of a Type II error under a local-to-unity parameter sequence. Moreover, in addition to providing informative and asymptotically correct confidence intervals, our procedure has the further advantage that it is given in analytical form and, hence, is computationally simple to implement. Simulations confirm that the proposed method performs well in finite samples.
The remainder of the paper proceeds as follows. Section 2 briefly describes the model, assumptions, and notation. Section 3 introduces two new ways of constructing uniform confidence intervals for the parameter ρ. The first is based on inverting the M statistic, and the second is the pre-test based confidence interval. Results given in this section show that both confidence procedures are asymptotically valid. Section 4 reports the results of a Monte Carlo study comparing our proposed 2 For readers interested in the asymptotic properties of the Anderson-Hsiao IV estimator, we would like to refer them to Theorem SA-1 in the Technical Supplement to this paper. There, we present a very extensive set of results on the large sample behavior of this estimator under various parameter sequences both near and far away from unity. In addition, the proof of Theorem SA-1 is provided in Appendix SB of the Technical Supplement.

3
Other approaches for achieving uniform inference in estimation have been proposed recently in the time series literature by (Han et al. 2011) using partial aggregation methods and by (Gorodnichenko et al. 2012) using quasi-differencing. In the unit root and very near unit root cases, extending these approaches to the panel data setting leads to confidence intervals whose width shrinks at a slower rate than the optimal N −1/2 T −1 rate obtained here. (Han et al. 2014) developed a panel estimator using X-differencing which has good bias properties and limit theory but has different limit theory in unit root and stationary cases, complicating uniform inference.
confidence procedures with some alternative procedures. We provide a brief conclusion in Section 5.
Proofs of the main theorems are given in the Appendix A to this paper. Proofs of additional supporting lemmas, as well as additional Monte Carlo results, are reported in an online supplement to this paper (Chao and Phillips 2019). The supplement provides an extensive set of results for panel estimation limit theory in unit root and near unit root cases that help deliver the main results in the paper but are of wider interest regarding asymptotic behavior of panel IV estimators, particularly in weak instrument settings. The supplement also includes additional simulation results concerning the performance of the estimation procedures considered in the paper.
A word on notation. We use ⇒ for convergence in distribution or weak convergence, p → for convergence in probability, χ 2 ν denotes a chi-square random variable with ν degrees of freedom, Z denotes the standard normal random variable, and W i (r) is the standard Brownian motion on the unit interval [0, 1] for each i. For two sequences In addition, the notations Pr (·|ρ) and Pr (·|ρ T ) denote, respectively, a probability measure indexed by the fixed parameter ρ and one indexed by the local-to-unity parameter ρ T . Finally, we use wid(C) to denote the width of the confidence interval C.

Model and Assumptions
We work with the following dynamic panel data model written in unobserved components form for i = 1, . . . , N and t = 1, . . . , T. Here, {y it } is the observed data, {w it } is generated by a latent AR(1) process, a i denotes an (unobserved) individual effect, and ρ denotes the panel autoregressive parameter, which is assumed to belong in the parameter space Θ ρ = (−1, 1]. In this paper, we will show that certain properties of our procedure hold uniformly for ρ ∈ Θ ρ = (−1, 1]. We do this by making use of a result (Lemma 2.6.2) from (Lehmann 1999) which establishes the equivalence of uniform convergence and convergence for every parameter sequence belonging to a given parameter space. For this purpose, it is convenient for us to consider a general class of local-to-unity parameter sequences of the form ρ = ρ T = exp {−1/q (T)}, where q (T) is a non-negative function of T such that q (T) → ∞ as T → ∞ 4 . Moreover, parameter sequences for stable AR processes can also be written in this general form by considering parameter sequences {ρ T } which belong to the collection Note also that, in the case where such parameter sequences are considered, the AR process given in expression (2) will depend on an indexed parameter ρ T , as opposed to a fixed autoregressive parameter, so that, strictly speaking, the observed data and the latent process, in this case, will be strictly triangular 4 The reason we consider indexed parameter ρ T which depends on T only, and not on both N and T, is because our main results are obtained under a general pathwise asymptotic scheme where N can grow as an arbitrary positive real-valued power of T. In such a framework, the asymptotics are effectively single-indexed. Hence, it suffices to consider parameter sequences that depend only on T.
indexed arrays {y it,T , w it,T }, which depend additionally on T. However, for notational ease, we shall suppress this additional dependence and simply write {y it , w it } in what follows. It is sometimes convenient to rewrite the model (1) and (2) in the alternate familiar form as a first-order autoregressive process in y it , viz., where η i = a i (1 − ρ). The following assumptions are made on the model.
Assumption 3 (Initialization): Let y i0 = a i + w i0 . Suppose that {w i0 } is independent across i. Suppose also that there exists a positive constant C such that sup i E w 2 i0 ≤ C < ∞, and that w i0 and ε jt are independent for all i, j = 1, 2, . . . , N and for all t = 1, 2, . . . , T.
Note that Assumption 3 on the initial condition does not impose mean stationarity, i.e., the condition that E [y i0 |a i ] = η i / (1 − ρ) = a i a.s., which in our setup is equivalent to the restriction that E [w i0 |a i ] = 0 a.s. In addition, observe that Assumption 3 allows for the case where the initial condition is fixed, i.e., w i0 = c i for some sequence of constants {c i } such that sup i |c i | < ∞. It is also general enough so that we may specify w i0 to be fixed in the unit root case but allow w i0 to be a draw from its unconditional distribution with variance σ 2 / 1 − ρ 2 when the underlying process is stationary.
In lieu of Assumption 2, we also consider in this paper a fixed-effects specification given by the following assumption.
Assumption 2* (Fixed Effects): Let {a i } be a nonrandom sequence. Suppose that there exists a positive constant C such that sup 1≤i≤N |a i | ≤ C < ∞ for all N.
All our theoretical results hold under either the random-effects specification given by Assumption 2 or the fixed-effects specification given by Assumption 2*.

Confidence Intervals Based on the Anderson-Hsiao IV procedure
A primary objective of this paper is to develop confidence procedures with asymptotic coverage probability that is at least that of the nominal level uniformly over the parameter space ρ ∈ (−1, 1]. As a first step, we consider a statistic based on the empirical moment function of the Anderson-Hsiao IV procedure, but properly standardized by an appropriate estimator of the scale parameter. In particular, let and where y i = T −1 1 ∑ T t=2 y it and y i,−1 = T −1 1 ∑ T t=2 y it−1 with T 1 = T − 1. In addition, we let ρ denote any preliminary estimator of ρ that satisfies the following conditions Assumption 4: Let ρ be an estimator of ρ. Suppose that the following conditions hold for this estimator as N, T → ∞ such that N κ /T = τ, for κ ∈ (0, ∞) and τ ∈ (0, ∞).
To provide some intuition about the M (ρ) statistic and about the conditions placed on the preliminary estimator ρ (i.e., Assumption 4), we set M * (ρ) = ωM (ρ) to be the unstandardized version of M (ρ) . From the proof of Theorem 1, given in Appendix A, it is evident that M * (ρ) can be decomposed into several terms whose orders of magnitude change depending on how close the parameter sequence {ρ T } is to unity. In consequence, the lead term of M * (ρ) is not the same in the stable (panel) autoregression case as it is in the case where ρ T is very close to unity. On the other hand, when appropriately normalized, this statistic will converge to a standard normal distribution in each case, but this requires a scale estimator that will adapt to variation in the normalization factor under alternative parameter sequences. The estimator iT−2 turns out to have these adaptive properties, as shown in Lemma SC-13 and its proof (given in the supplement). An important component in the construction of a proper normalization factor is to have a preliminary estimator ρ with a fast enough rate of convergence, so that the resulting estimator of σ 2 is consistent under every possible parameter sequence {ρ T } in the parameter space Θ ρ = (−1, 1]. Examination of the proof of Lemma SC-12 reveals that the conditions needed on ρ are precisely those given in Assumption 4 5 .
It should be noted that the Anderson-Hsiao IV estimator, which we will denote by ρ IVD in this paper, does not satisfy the conditions of Assumption 4 6 . This is because, as shown in Theorem SA-1 of the supplement to this paper, , so that its rate of convergence is not fast enough in the unit root and near unit root regions of the parameter space. Furthermore, the pooled OLS (POLS) estimator, which we will denote by ρ pols , also does not satisfy these conditions since it is inconsistent in the stationary region as shown in Theorem SA-2 of the supplement. Hence, in Appendix SA of the supplement to our paper, we introduce a new point estimator, ρ AIP , which is an average of ρ IVD and ρ pols where the average is taken using a data-dependent weight function that, in turn, depends on a unit root statistic. This estimator turns out to satisfy the conditions of Assumption 4 because it exploits the differential strengths of ρ IVD and ρ pols in different parts of the parameter space and can place more or less weight on one or the other of these two estimators, depending on the information provided by a preliminary unit root test on the true value of ρ. We use ρ AIP in constructing the scale estimator ω for the Monte Carlo results reported Section 4 of this paper, but we note that ρ AIP is not the only estimator which satisfies Assumption 4, as both the within-group OLS estimator and the bias-corrected within-group estimator proposed by (Hahn and Kuersteiner 2002) could also be used to obtain a scale estimator ω with the desired properties, although ρ AIP does have a faster rate of convergence than the uncorrected within-group estimator in the unit root and near unit root cases. Because the focus of this paper is on confidence procedures, and not on point estimation, we will not give technical details of ρ AIP in the body of this paper, but will instead refer interested readers to Appendix SA of the supplement for more details, as well as for formal results, on the rate of convergence of ρ AIP under alternative parameter sequences. 5 The proof of Lemma SC-12 is also given in Appendix SC of the technical supplement. 6 We use the notation ρ IVD to denote the Anderson-Hsiao IV estimator because it is a procedure where IV estimation is performed on a first-differenced equation. Later, we use ρ IVL to denote the IV estimator introduced by (Arellano and Bover 1995) since, in that procedure, IV is performed on the panel autoregression in levels.
The following theorem shows the uniform convergence of the statistic M (ρ) over the parameter space Θ ρ = (−1, 1]. Theorem 2. Let Φ (x) denote the cdf of a standard normal random variable. Suppose that Assumptions 1, 3, 4, and either 2 or 2* hold. Then, for each x ∈ R, interval based on the statistic M (ρ) can be taken to be It is immediate from Theorem 2 that the confidence procedure defined by (5) is asymptotically valid in the sense that its coverage probability is equal to the nominal level 1 − α in large samples, uniformly over The uniform limit result given in Theorem 2 above is established under a pathwise asymptotic scheme where we take N, T → ∞ such that N κ /T = τ for constants κ ∈ (0, ∞) and τ ∈ (0, ∞). Note that the asymptotic framework employed here does not restrict N and T to follow a specific diagonal expansion path, but rather allows for a whole range of possible paths indexed by κ ∈ (0, ∞); and, hence, our results do not require the kind of restrictions on the relative magnitudes of N and T that are often imposed in other asymptotic analysis of panel data models. Indeed, by allowing T to grow as any positive (real-valued) power of N, our framework can accommodate a wide variety of settings where T may be of smaller, larger, or similar order of magnitude as N.
(iii) As noted earlier and as is evident from the proof of Theorem 2 given in the Appendix A, uniform convergence here is established by showing convergence to the same distribution under every parameter sequence in the parameter space. To the best of our knowledge, the use of this approach in statistics originated with the book on large sample theory by (Lehmann 1999). Important extensions, as well as applications, of this approach to a variety of econometric models and inferential procedures have also been made more recently in the papers by (Andrews and Guggenberger 2009;Andrews et al. 2011).
(iv) As we noted earlier in the Introduction, a primary reason why the M statistic is well-behaved is that the (empirical) IV moment function is well-centered as an unbiased estimating equation. In this sense, our approach relates to early work by (Durbin 1960) on unbiased estimatingequations which was applied to time series AR (1) regression in his original study. Importantly, in dynamic panel data models with individual effects, estimating equations associated with least squares procedures tend not to be as well-centered as the IV estimating equations explaining the need for IV in this context (c.f., Han and Phillips 2010).
(v) A drawback of C M α is that the rate at which the width of this confidence interval shrinks toward zero as sample sizes grow is relatively slow for parameter sequences that are very close to unity. As also noted in the Introduction, this is due to the well-known 'weak instrument' problem which induces a 7 Note that we use the notation Pr (·|ρ) instead of perhaps the more familiar notation P ρ (·) to denote a probability measure indexed by the parameter ρ because, in this paper, we will often consider somewhat complicated local-to-unity parameters and subsequences of such parameters, which are less conveniently expressed in terms of subscripts. slow rate of convergence for the Anderson-Hsiao IV procedure in this case. More precisely, using the results given in Lemmas SA-1, SC-1, and SC-13 in the supplement to this paper, we can easily show that , so that the rate of shrinkage here does not even depend on N, even as both N and T go to infinity (see also Phillips 2018).
This slower rate of convergence is also reflected in the Monte Carlo results reported in Section 4 below, as the results there show that the average interval width of C M α can be a very substantial fraction of the width of the entire parameter space when ρ = 1. To improve on the performance of C M α , the next subsection introduces a pretest-based confidence procedure which is similarly asymptotically valid but which in addition provides more informative intervals when the underlying process has a unit root or a near unit root.

A Pretest-Based Confidence Procedure
To enhance the informativeness of the confidence procedure when there is a unit root, we use a pretest approach. The idea is to apply two different unit root tests sequentially to assess the proximity of ρ to unity and then implement different confidence intervals depending on the information about the location of ρ that emerges from these tests. More precisely, we propose the following level 1 − α confidence interval of the form where C M α 1 is as defined in (5) above, and where γ = (γ 1 , γ 2 ), α = α 1 + α 2 , I is an indicator function, and we again take z γ 1 to be the 1 − γ 1 quantile of a standard normal distribution for some γ 1 ∈ (0, 0.5], with z γ 2 and z α 2 similarly defined. In addition, we take , to be the unit root test statistic based on the POLS estimator; and ∆y it−1 y it which was introduced by (Arellano and Bover 1995) and further analyzed in (Blundell and Bond 1998). From expression (6), it is apparent that the confidence procedure follows a sequential tree structure.
We first pretest for the presence of a unit root using T 1 . If the result of this first test fails to reject the unit root null hypothesis, then we employ the tighter unit root interval C UR1 γ 1 ,α 2 . Otherwise, we conduct a second test of the unit root null hypothesis using a less powerful test T 2 . If this second test fails to reject the null hypothesis, we use the wider unit root interval C UR2 γ 2 ,α 2 . On the other hand, if both tests reject the unit root null hypothesis, we then use the interval C M α 1 , which is asymptotically valid but less informative unless the true value of ρ is sufficiently far away from unity.
The next theorem shows that this confidence procedure is asymptotically valid in the sense that its non-converage probability is at most the nominal significance level α uniformly over the parameter space under pathwise asymptotics.
Theorem 3. Let α ∈ (0, 0.5] be the specified significance level and let N, Suppose that Assumptions 1, 3, 4, and either 2 or 2* hold. Then, Remark 2. (i) The pre-test based confidence procedure proposed here is inspired by the work of (Lepski 1999) who used information from a test procedure to increase the accuracy of confidence sets. The original Lepski paper and subsequent extensions of that paper focused on problems of nonparametric function estimation and canonical versions of such problems, as represented by the many normal means model. Because we deal with a model that differs from the one studied in (Lepski 1999) and because we use a dual pre-test framework, the construction and analysis of our procedure also differ, even though we use the same idea to improve set estimation accuracy.
Pr ρ ∈ C γ,α,T |ρ , it follows that the result obtained in Theorem 3, i.e., lim sup T→∞ sup ρ∈(−1,1] Pr ρ / ∈ C γ,α,T |ρ ≤ α, is equivalent to lim inf T→∞ inf ρ∈ (−1,1] Pr ρ ∈ C γ,α,T |ρ ≥ 1 − α, so that the proposed confidence interval has asymptotic coverage probability that is at least the nominal level 1 − α uniformly over ρ ∈ (−1, 1]. (iii) In the procedure given by (6), α 1 is the significance level for the confidence interval C M α 1 . It is, of course, also the asymptotic non-coverage probability of C M α 1 , since C M α 1 is asymptotically valid. (iv) As noted in the Introduction and in Remark 3.1 (v) above, a drawback of C M α 1 is that its width shrinks slowly for parameter sequences that are very close to unity. The pre-test confidence procedure seeks to improve on this rate by applying two different unit root tests sequentially and by using the information from these tests to determine whether to use local-to-unity intervals whose width shrinks at a faster rate than C M α 1 when the autoregressive parameter value is in close proximity of unity. To see how this improvement is achieved, note that when the true parameter value is within an N −1/2 T −1 neighborhood of unity then, aside from the relatively small probability event of a Type I error, the first unit root test T 1 will fail to reject H 0 : ρ = 1, resulting in the use of the interval C UR1 γ 1 ,α 2 . When the parameter is this close to unity, wid C UR1 so that the use of C UR1 γ 1 ,α 2 leads to significant improvement over C M α 1 . The reason for a second unit root test using the statistic T 2 is that for parameter sequences the first unit root test T 1 will reject H 0 with probability approaching one as sample sizes grow, but the less powerful unit root test based on T 2 will not, subject again to the relatively small probability event of a Type I error. For parameter sequences in this region, wid C M α 1 = O p N −1/2 T −3/2 q (T) . The result is that we can make further improvement by using the interval C UR2 γ 2 ,α 2 which has width wid C UR2 γ 2 ,α 2 = O p N −1/2 T −1/2 = o p N −1/2 T −3/2 q (T) . Finally, if both these unit root tests reject H 0 , then our procedure will infer that the parameter is far enough away from unity to use C M α 1 . Of course, the two unit root tests are subject to Type II errors; but, as explained in Remark 3.2(vi) below, the probability of Type II errors could also be properly controlled under our procedure 8 .
(v) γ 1 and γ 2 , on the other hand, are the significance levels for the unit root tests based on T 1 and T 2 . Note that, especially in large samples, the specification of γ 1 and γ 2 really has more of an impact on the width of the resulting interval than it does on the coverage probability, so that γ 1 and γ 2 are not significance levels in the traditional sense. For example, consider the choice of γ 1 . Observe that a smaller value of γ 1 leads to a wider C UR1 γ 1 ,α 2 . However, the effect of γ 1 on the width of the interval adopted by the overall procedure could be ambiguous, since, if the null hypothesis of an exact unit root is true, an increase in γ 1 would reduce the width of C UR1 γ 1 ,α 2 but could also lead to a greater chance that T 1 will falsely reject the null hypothesis and switch to either C UR2 γ 2 ,α 2 or C M α 1 , both of which are wider than C UR1 γ 1 ,α 2 in large samples. A similar argument shows that it is also difficult to predict a priori the effect of varying γ 2 on the width of the resulting interval.
On the other hand, note that, except for pathological specifications where γ 1 = 0 and/or γ 2 = 0 (ruled out by our assumption), varying either γ 1 or γ 2 or both does not lead to a material distortion in the (asymptotic) coverage probability of the proposed procedure. To see why this is so, consider the case where the unit root specification is true. Then, even when both γ 1 and γ 2 are set to be large so that the null hypothesis is falsely rejected with high probability leading to the use of C M α 1 , we will still end up with asymptotic coverage probability greater than the nominal level 1 − α since C M α 1 is asymptotically valid and, by design, On the other hand, if the underlying process is stable, then both of the unit root tests will reject the null hypothesis withprobability approaching one asymptotically, as long as neither γ 1 nor γ 2 is set equal to zero, and our procedure will switch to C M α 1 which controls the asymptotic coverage probability properly.
(vi) Pre-testing leads to the possibility of errors whose probability needs to be controlled. In particular, there may be parameter sequences which lie just outside of C UR1 γ 1 ,α 2 , for which T 1 may fail to reject H 0 : ρ = 1 even in large samples. In addition, there may be parameter sequences which lie just outside of C UR2 γ 2 ,α 2 , for which H 0 is rejected by T 1 but for which T 2 may not reject H 0 even in large samples. In both of these scenarios, there is the possibility that none of our intervals will cover the true parameter sequence. However, in the proof of Lemma A1 given in the Appendix SB of the technical supplement, we show that, under our procedure, the probability of committing such Type II errors can be no greater than α 2 asymptotically 9 .
Hence,by constructing C UR1 γ 1 ,α 2 and C UR2 γ 2 ,α 2 in the manner suggested above, we can properly control the probability of not switching to C M α 1 when it is preferable to make that switch. In consequence, the asymptotic non-coverage probability under our procedure is always less than or equal to α = α 1 + α 2 . Given a particular significance level α, different combinations of α 1 and α 2 involve trade-offs where a smaller α 2 8 A recent paper by (Bun and Kleibergen 2014) also considers, amongst other things, combining elements of the approach of Hsiao 1981, 1982;Arellano and Bond 1991), which uses lagged levels of y it as instruments for equations in first differences, with the approach by (Arellano and Bover 1995;Blundell and Bond 1998) which uses lagged differences of y it as instruments for equations in levels. The focus of the (Bun and Kleibergen 2014) paper differs substantially from that of the present paper. In particular, they consider test procedures which attain the maximal attainable power curve under worst case setting of the variance of the initial conditions, whereas our procedure uses pretest based information to aggressively increase the power of our inferential procedure in certain regions of the parameter space. Moreover, unlike our paper, they do not provide results on confidence procedures whose asymptotic coverage probability is explicitly shown to be at least that of the nominal level uniformly over the parameter space; and their analysis is conducted within a fixed T framework. 9 The statement of Lemma A1 is given in the Appendix A of the paper. Its proof is lengthy and it is therefore placed in the technical supplement.
leads to a smaller probability of committing a Type II error but also leadsto a larger α 1 and, thus, to C M α 1 having a smaller asymptotic coverage probability.
(vii) An advantage of our pretest based confidence procedure is its computational simplicity, as it is given in analytical form and, thus, does not require the use of bootstrap or other types of simulation-based methods for its computation. Moreover, the fact that C M α 1 , the interval used under our procedure in the stable case, is based on the Anderson-Hsiao procedure has the further benefit that its validity does not depend on imposing the assumption of mean stationarity of the initial condition. Hence, the design of our procedure has taken into consideration certain trade-offs on the competing goals of interval accuracy, computational simplicity, and the relaxation of the assumption of initial condition stationarity.

Monte Carlo Study
This section reports the results of a Monte Carlo study comparing the finite sample performance of alternative confidence procedures. For the simulation study, we consider data generating processes of the form N (2, 1). We vary ρ = 1.00, 0.99, 0.95, 0.90, 0.80, and 0.60 and w i0 = 0, 2. In addition, we let N = 100, 200. When N = 100, we take T = 50, 100; and when N = 200, we consider T = 100, 200. We take α = 0.05 throughout, so that the (nominal) confidence level is always kept at 95%. Four versions of the pre-test based confidence interval (PCI) given by expression (6) above are considered, with different specifications of γ 1 , γ 2 , α 1 , and α 2 , as summarized in the following tables.
Tables 1-12 below provide simulation results comparing the four PCI procedures described above with the C M 0.05 procedure given in (5) and with confidence intervals obtained by inverting Studentized statistics associated with the POLS and IVD estimators. More specifically, Tables 1-4 give the empirical   coverage probabilities while Tables 5, 7, 9 and 11 report the average width of the confidence intervals under each of forty-eight experimental settings, obtained by varying ρ, N, T, and w i0 . In addition, in Tables 6, 8, 10 and 12 we report the number of instances out of 10,000 simulationrepetitions that a particular confidence procedure leads to an empty interval, which occurs when the intersection of the (unrestricted) interval and the parameter space is the null set. For example, in the case of the C M 0.05 procedure, an empty interval would arise if Glancing at Tables 1-4, we see that, consistent with our theory, the empirical coverage probabilities of the C M 0.05 procedure show the greatest degree of uniformity across different experiments. On the other hand, all four PCIs have empirical coverage probabilities that are uniformly better than the C M 0.05 procedure across all forty-eight experiments. An intuitive explanation for this result can be given as follows. When the unit root null hypothesis is true, application of the pre-test procedure will lead to the use of either C UR1 γ 1 ,α 2 or C UR2 γ 2 ,α 2 , except in the small probability event where a Type I error is committed by both of the unit root tests T 1 and T 2 . Since both of these intervals cover the point ρ = 1 by construction, the overall procedure in this case should cover this point with very high probability. On the other hand, when the unit root hypothesis is false, the pre-test procedure switches to the interval C M α 1 but with α 1 set at a level strictly less than 0.05, resulting again in coverage probabilities which are greater than that of the C M 0.05 procedure.  Tables 5, 7, 9 and 11 is that the average widths of intervals obtained from this procedure are substantially wider than that of the other procedures when ρ = 1. Moreover, in the ρ = 1 case, the use of the C M 0.05 procedure results in empty intervals in roughly 2.61% of the times, ranging from a low of 215 empty intervals (out of 10, 000 repetitions) in the case with N = 100, T = 50, and w i0 = 0 to a high of 295 empty intervals (out of 10, 000 repetitions) in the case where N = 100, T = 100, and w i0 = 2 10 . In contrast, no empty interval is observed for any of the pre-test procedures in any of the 48 experiments, including experiments where ρ = 1. It should also be noted that, outside of the unit root case, the results of Tables 5-12 do show C M 0.05 to provide informative intervals with average widths that are much smaller than those in the ρ = 1 case. In addition, as the true value of ρ moves significantly away from unity, such as in the cases where ρ = 0.95, 0.9, 0.8, and 0.6; empty intervals were no longer observed for C M 0.05 . For the four alternative specifications of PCI, there does not seem to be a great deal of difference in their performance across the experiments, although some minor trade-offs in coverage probability vis-à-vis average interval width can be discerned. For example, looking at PCI1, we see that this procedure provides very tight intervals in the case where ρ = 1. In fact, the average interval width for this procedure in the unit root case is ≤0.0070, except in the smaller sample size case with N = 100 and T = 50, where it is still around 0.0133. Moreover, amongst the seven procedures examined in our study, the empirical coverage probability of PCI1 is the highest, or is at least tied for the highest, almost across the board, for the 48 experiments whose results are reported in Tables 1-4. Although the higher coverage probability of PCI1 in the stable region is due at least in part to the fact that it is designed to be conservative with α 1 = 0.025 when the true process is stable, it should be noted that the informativeness of PCI1, as measured by its average width, does not seem to have suffered significantly as a result. Note, in particular, that over the 48 experiments the widest average interval width recorded for PCI1 was only 0.1446, or approximately 7% of the width of the entire parameter space (−1, 1]; and this occurred with the smaller sample sizes of N = 100 and T = 50. In addition, PCI1 has average width strictly less than 0.1 in 38 of the experiments. On the other hand, PCI2 sets α 1 = 0.049 and is, thus, less conservative relative to PCI1, particularly in the stable region. In consequence, PCI2 tends to have not only smaller interval widths but also lower coverage probabilities relative to PCI1 when the underlying process is stable. The results for PCI1 and PCI2 are illustrative of how the pre-test procedures can greatly improve upon C M 0.05 in terms of accuracy in the unit root and near unit root cases while maintaining coverage probability at a high level throughout the parameter space, with the only downside being that they yield slightly wider intervals when the true process is stable. 10 It might initially seem strange in Tables 6 and 8 that in the cases where ρ = 1 and N = 100, the number of empty intervals for C M 0.05 actually increased as the sample size in the time dimension increased from T = 50 to T = 100. However, there is an intuitive explanation for this result. As noted earlier, in the unit root case, the rate of concentration of the width of the C M 0.05 interval is O p T −1/2 , so that intervals obtained under this procedure are wider in the T = 50 case than in the T = 100 case, leading to a higher chance of a non-null intersection with the parameter space. Tables 1-4 also show that confidence intervals constructed by inverting Studentized statistics associated with ρ POLS and ρ IVD are decidedly inferior to the pre-test based confidence procedures. Consistent with our theory, Tables 1-4 show that these confidence intervals have highly non-uniform coverage probabilities across different (true) parameter values ρ. More specifically, the coverage probabilities of the IV-based confidence intervals are especially poor when ρ is unity or near-unity, whereas the coverage probabilities for the POLS-based confidence intervals begin to deviate dramatically from the nominal level when ρ = 0.95 or less. Moreover, from the results reported in Tables 6, 8, 10 and 12, we note that CI IVD , the confidence procedure based on inverting the Studentized statistic associated with ρ IVD , leads to an empty interval in more than 40% of the simulation runs when ρ = 1. This is perhaps not surprising since, as shown in Theorem SA-1 in Appendix SA of the supplement to this paper, ρ IVD is not uniformly convergent over the parameter space and does not have an asymptotic normal distribution when the true ρ equals unity. Hence, when ρ = 1, the CI IVD procedure, which is designed to achieve the correct asymptotic coverage in the stationary case, will not only exhibit poor coverage probabilities but will often deliver intervals that lie entirely outside the parameter space. Interestingly, even though the CI POLS procedure is based on the correct asymptotics when ρ = 1, it nevertheless produces some empty intervals in the unit root case as shown in Tables 6, 8, 10 and 12. This suggests a need to modify the usual t-ratio based confidence procedure in cases where there is interest in a point on the boundary of a bounded parameter space, such as ρ = 1.             There are no empty intervals for any of the procedures in the cases ρ = 0.9, 0.8, and 0.6.   There are no empty intervals for any of the procedures in the cases ρ = 0.9, 0.8, and 0.6.

Conclusions
The uniform inference procedure proposed here utilizes information from pretesting the unit root hypothesis to aid the construction of confidence intervals in panel autoregression by means of data-based selection among intervals that are well suited to particular regions of the parameter space.
The construction is asymptotically valid in the sense that the large sample coverage probability is at least that of the nominal level uniformly over a wide parameter space that includes unity. The method is particularly simple to implement in practical work and simulations provide encouraging evidence that the method produces confidence intervals with good finite sample accuracy, as measured by the combination of empirical coverage probability and average interval width. The panel AR model considered here is a simple model. But it is the kernel of all dynamic panel models and embodies all the characteristics that make uniform inference and confidence interval construction difficult. Even in the time series case these problems are well known to be challenging. In the panel case, the challenges are accentuated by additional issues arising from the presence of incidental effects and multi-index limit theory. The pre-test confidence interval solution proposed here addresses these challenges and has potential for application in more complex models. Funding: This research received no external funding.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Proofs of the Main Results
The proofs given here rely on a large number of technical results that are established in the Technical Supplement (Chao and Phillips 2019). These results are designated in the derivations that follow by use of the prefix S. Lemmas A1 and A2 are stated in Appendix A and their proofs are given in the Technical Supplement. The proofs rely on functional limit theory for integrated and near integrated processes in conjunction with joint limit theory arguments for multi-indexed asymptotics (Phillips 1987a(Phillips , 1987bPhillips and Moon 1999).
Proof of Theorem 1. Let ∆ε it (ρ T ) = ∆y it − ρ T ∆y it−1 , and note that Applying partial summation, we have We turn first to part (a). In this case, by assumption, ρ T = 1 for all T sufficiently large. Under the random-effects specification given by Assumption 2, we can apply parts (g) and (i) of Lemma SD-11, part (a) of Lemma SD-25, and part (a) of Lemma SC-13 to obtain It follows from applying Lemma SD-24 that M (ρ T ) ⇒ N (0, 1), as required. Moreover, it is easily seen that, by applying part (b) of Lemma SE-1 in lieu of part (g) of Lemma SD-11 in the argument given above, the same result can be obtained under the fixed-effects specification given by Assumption 2*.
Next consider part (b), where we take ρ T = exp {−1/q (T)} such that T/q (T) → 0. In the case of the random-effects specification given by Assumption 2, we can use the results in parts (g) and (i) of part (b) of Lemma SD-25, and part (b) of Lemma SC-13 to deduce that It follows from part (a) of Lemma SD-22 that M (ρ T ) ⇒ N (0, 1). Moreover, it is easily seen that, by applying part (b) of Lemma SE-1 in lieu of part (g) of Lemma SD-11 in the argument given above, the same result can be obtained under the fixed-effects specification given by Assumption 2*.
Consider part (c), where we take ρ T = exp {−1/q (T)} such that q (T) ∼ T. Under the random-effects specification given by Assumption 2, we can apply parts (g) and (i) of Lemma SD-11, part (c) of Lemma SD-25, and part (c) of Lemma SC-13 to deduce that It follows from part (b) of Lemma SD-22 that M (ρ T ) ⇒ N (0, 1). Moreover, it is easily seen that, by applying part (b) of Lemma SE-1 in lieu of part (g) of Lemma SD-11 in the argument given above, the same result can be obtained under the fixed-effects specification given by Assumption 2*.
For part (d), we consider the case where ρ T = exp {−1/q (T)} such that q (T) → ∞ but q (T) /T → 0. Here, we first apply part (d) of Lemma SC-13 and part (d) of Lemma SD-21 to obtain Under the random-effects specification given by Assumption 2, we further apply parts (g) and (i) of Lemma SD-11 and part (d) of Lemma SD-25 to obtain By part (c) of Lemma SD-22, we then deduce that M (ρ T ) ⇒ N(0, 1), as required for (d). Moreover, it is easily seen that, by applying part (b) of Lemma SE-1 in lieu of part (g) of Lemma SD-11 in the argument given above, the same result can be obtained under the fixed-effects specification given by Assumption 2*.
Finally, to show part (e), we first consider the random-effects specification given by Assumption 2.
In this case, note that, by applying parts (g) and (i) of Lemma SD-11, part (e) of Lemma SD-21, and part (e) of Lemma SC-13, we obtain . Moreover, it is easily seen that, by applying part (b) of Lemma SE-1 in lieu of part (g) of Lemma SD-11 in the argument given above, the same result can be obtained under the fixed-effects specification given by Assumption 2*.
Proof of Theorem 2. To proceed, note that, in the pathwise asymptotics considered here, N grows as a monotonically increasing function of T, so that the asymptotics can be taken to be single-indexed with T → ∞. Now, let G M j : j = 1, 2, 3, 4, 5 be the collections of parameter sequences defined in the statement of Theorem 1. Moreover, let ρ k,T ∈ G M s k (for k = 1, 2, . . . , 5), i.e., ρ k,T is a sequence belonging to the collection G M s k 11 . Define T k = f k (T) (k = 1, . . . , d), with d ≤ 5, where f k (·) : N → N is an increasing function in its argument, and let ρ k,T k denote a subsequence of ρ k,T . Note that every parameter sequence ρ T ∈ (−1, 1] can be represented as {T k = f k (T) : T ∈ N}, with N denoting the set of natural numbers.
Next, note that Pr ρ k,T / ∈ C M α 1 ,T |ρ k,T = Pr M T (ρ k,T ) > z α 1 /2 |ρ k,T . Theorem 1 implies that, for any ε > 0 and for each k ∈ {1, . . . , d}, there exists positive integer M k such that for every positive integer where Z ∼ N (0, 1). Moreover, for any positive integer T ≥ M k , we have T k = f k (T) ≥ T ≥ M k by Lemma SD-33 (given in Appendix SD in the technical supplement to this paper), from which we further deduce that Consider any positive integer T ≥ M; we must have T = f k (T * ) for some k = 1, . . . , d and for some T * ∈ N. Given that we also deduce that T ≥ T * ≥ M k by Lemma SD-33 since f k (·) is an increasing function of its argument. It follows that for every sequence {ρ T } and for all T ≥ M The desired result then follows from (Lepski 1999) Lemma 2.6.2.
(a) Let G P 1 = {{ρ T } : ρ T = 1 for all T sufficiently large}, and set N = N (T) = (τT) 1/κ and C γ,α,N,T = C γ,α,N(T),T = C γ,α,T . Then, for {ρ T } ∈ G P 1 , √ NT q (T) , and set N = N (T) = (τT) 1/κ and The reason for using the notation G M s k , as opposed to G M k , is so that we can refer to a particular collection of sequences amongst G M j : j = 1, 2, . . . , 5 without G M s 1 necessarily being G M 1 , for example.
(m) Let G P
The proof of Lemma A1 is given in Appendix SB of the technical supplement.
Proof of Theorem 3. In the pathwise asymptotics considered here, N grows as a monotonically increasing function of T, so that the asymptotics can be taken to be single-indexed with T → ∞. Hence, we can set N = (τT) 1/κ and simplify notation by writing C γ,α,N,T = C γ,α,N(T),T = C γ,α,T . To proceed, note that, by property of a supremum, there exists a sequence {ρ T ∈ (−1, Pr ρ / ∈ C γ,α,T |ρ ≤ α, it suffices to show that lim sup T→∞ Pr ρ T / ∈ C γ,α,T |ρ T ≤ α for every sequence {ρ T ∈ (−1, 1] : T ≥ 1}. To proceed, let G P j : j = 1, 2, . . . , 23 be the collections of parameter sequences defined in Lemmas A1 and A2 given above. Moreover, let ρ k,T ∈ G P s k (for k = 1, . . . , 23), i.e., ρ k,T is a sequence belonging to the collection G P s k . Define T k = f k (T) (k = 1, . . . , d), with d ≤ 23, where f k (·) : N → N is an increasing function in its argument, and let ρ k,T k denote a subsequence of ρ k,T . Note that every parameter sequence ρ T ∈ (−1, 1] can be represented as {ρ T } = d j=1 ρ j,T j , where ρ 1,T 1 ∈ G P s 1 , . . . , ρ d,T d ∈ G P s d , with G P s k = G P s for k = and where N = d k=1 {T k = f k (T) : T ∈ N} (A1) with N denoting the set of natural numbers {1, 2, . . . }. Moreover, define υ k,T = sup m≥T Pr ρ k,m / ∈ C γ,α,m |ρ k,m ∈ G P s k and p k = lim sup T→∞ Pr ρ k,T / ∈ C γ,α,T |ρ k,T ∈ G P s k . It is clear from the definition of υ k,T and p k that lim T→∞ υ k,T = p k for each k ∈ {1, 2, . . . , d}; or, more formally, for any ε > 0, there exists positive integer L k such that, for all T ≥ L k , υ k,T − p k < ε, from which it follows, using the results of Lemma A1, that, for any ε > 0 and for each k ∈ {1, 2, . . . , d}, there exists a positive integer L k such that, for all T ≥ L k , υ k,T < p k + ε ≤ α + ε. Now, for any k ∈ {1, . . . , d} and for any positive integer T ≥ L k , we have, by Lemma SD-33 given in Appendix SD of the technical supplement to this paper, that T k = f k (T) ≥ L k , so that υ k,T k − p k < ε, for any subsequence υ k,T k of υ k,T , from which we further deduce that υ k,T k = sup m≥T k Pr ρ k,m / ∈ C γ,α,m |ρ k,m ∈ G P s k < p k + ε ≤ α + ε.
Next, let L max = max { f 1 (L 1 ) , . . . , f d (L d )}. Consider any positive integer T ≥ L max ; then, (A1) implies that T = f k (T * ) for some k = 1, . . . , d and for some T * ∈ N. By the fact that f k (·) is an increasing function of its argument, we have that T = f k (T * ) ≥ L max ≥ f k (L k ) ≥ L k , from which it follows that for every positive integer T ≥ L max Pr (ρ m / ∈ C γ,α,m |ρ m ) < α + ε.