Jackknife Bias Reduction in the Presence of a Near-Unit Root

: This paper considers the speciﬁcation and performance of jackknife estimators of the autoregressive coefﬁcient in a model with a near-unit root. The limit distributions of sub-sample estimators that are used in the construction of the jackknife estimator are derived, and the joint moment generating function (MGF) of two components of these distributions is obtained and its properties explored. The MGF can be used to derive the weights for an optimal jackknife estimator that removes fully the ﬁrst-order ﬁnite sample bias from the estimator. The resulting jackknife estimator is shown to perform well in ﬁnite samples and, with a suitable choice of the number of sub-samples, is shown to reduce the overall ﬁnite sample root mean squared error, as well as bias. However, the optimal jackknife weights rely on knowledge of the near-unit root parameter and a quantity that is related to the long-run variance of the disturbance process, which are typically unknown in practice, and so, this dependence is characterised fully and a discussion provided of the issues that arise in practice in the most general settings.


Introduction
Throughout his career, Peter Phillips has made important contributions to knowledge across the broad spectrum of econometrics and statistics, providing inspiration to many other researchers along the way. This paper builds on two of the strands of Peter's research, namely jackknife bias reduction and the analysis of nonstationary time series. Indeed, our own work on the jackknife (Chambers 2013(Chambers , 2015Chambers and Kyriacou 2013) was inspired by Peter's work on this topic with Jun Yu, published as Phillips and Yu (2005), and the current contribution also extends the results on moment generating functions (MGFs) contained in Phillips (1987a).
The jackknife has been proven to be an easy-to-implement method of eliminating first-order estimation bias in a wide variety of applications in statistics and econometrics. Its genesis can be traced to Quenouille (1956) and Tukey (1958) in the case of independently and identically distributed (iid) samples, while it has been adapted more recently to accommodate more general time series settings. Within the class of stationary autoregressive time series models, Phillips and Yu (2005) show that the jackknife can effectively reduce bias in the pricing of bond options in finance, while Chambers (2013) analyses the performance of jackknife methods based on a variety of sub-sampling procedures. In subsequent work, Chambers and Kyriacou (2013) demonstrate that the usual jackknife construction in the time series case has to be amended when a unit root is present, while Chen and Yu (2015) show that a variance-minimising jackknife can be constructed in a unit root setting that also retains its bias reduction properties. In addition, Kruse and Kaufmann (2015) compare bootstrap, jackknife and indirect inference estimators in mildly explosive autoregressions, finding that the indirect inference estimator dominates in terms of root mean squared error, but that the jackknife excels for bias reduction in stationary and unit root situations.
The usual motivation for a jackknife estimator relies on the existence of a Nagar-type expansion of the original full-sample estimator's bias. Its construction proceeds by finding a set of weights that, when applied to a full-sample estimator and a set of sub-sample estimators, is able to eliminate fully the first-order term in the resulting jackknife estimator's bias expansion. In stationary time series settings, the bias expansions are common to both the full-sample and sub-sample estimators, but Chambers and Kyriacou (2013) pointed out that this property no longer holds in the case of a unit root. This is because the initial values in the sub-samples are no longer negligible in the asymptotics and have a resulting effect on the bias expansions, thereby affecting the optimal weights. Construction of a fully-effective jackknife estimator relies, therefore, on knowledge of the presence (or otherwise) of a unit root.
In this paper, we explore the construction of jackknife estimators that eliminate fully the first-order bias in the near-unit root setting. Near-unit root models have attracted a great deal of interest in time series owing, amongst other things, to their ability to capture better the effects of sample size in the vicinity of a unit root, to explore analytically the power properties of unit root tests and to allow the development of an integrated asymptotic theory for both stationary and non-stationary autoregressions; see Phillips (1987a) and Chan and Wei (1987) for details. We find that jackknife estimators can be constructed in the presence of a near-unit root that achieve this aim of bias reduction. Jackknife estimators have the advantage of incurring only a very slight additional computational burden, unlike alternative resampling and simulation-based methods such as the bootstrap and indirect inference. Furthermore, they are applicable in a wide variety of estimation frameworks and work well in finite sample situations in which the prime objective is bias reduction. Although the bootstrap is often a viable candidate for bias reduction, it was shown by Park (2006) that the bootstrap is inconsistent in the presence of a near-unit root, and hence, jackknife methods offer a useful alternative in these circumstances.
The development of a jackknife estimator that achieves bias reduction in the near-unit root case is not simply a straightforward application of previous results. While Chambers and Kyriacou (2013) first pointed out that under unit root non-stationarity, the effect of the sub-sample initial conditions does not vanish asymptotically, thereby affecting asymptotic expansions of sub-sample estimator bias and the resulting jackknife weights as compared to the stationary case, the extension of these results to a local-to-unity setting is not obvious. With a near-unit root, the autoregressive parameter plays an important role, and it is therefore necessary to derive the appropriate asymptotic expansion of sub-sample estimator bias for this more general case, as well as the MGFs of the relevant limiting distributions that can be used to construct the appropriate jackknife weights. The derivation of such results is challenging in itself and is a major reason why we focus on the bias-minimising jackknife, rather than attempting to derive results for the variance-minimising jackknife of Chen and Yu (2015).
The paper is organised as follows. Section 2 defines the near-unit root model of interest and focuses on the limit distributions of sub-sample estimators, demonstrating that these limit distributions are sub-sample dependent. An asymptotic expansion of these limit distributions demonstrates the source of the failure of the standard jackknife weights in a near-unit root setting by showing that the bias expansion is also sub-sample dependent. In order to define a successful jackknife estimator, it is necessary to compute the mean of these limit distributions, and so, Section 3 derives the moment generating function of two random variables that determine the limit distributions over an arbitrary sub-interval of the unit interval. Expressions for the computation of the mean of the ratio of the two random variables are derived using the MGF. Various properties of the MGF are established, and it is shown that results obtained in Phillips (1987a) arise as a special case, including those that emerge as the near-unit root parameter tends to minus infinity.
Based on the results in Sections 2 and 3, the optimal weights for the jackknife estimator are defined in Section 4, which then goes on to explore, via simulations, the performance of the proposed estimator in finite samples. Consideration is given to the choice of the appropriate number of sub-samples to use when either bias reduction or root mean squared error (RMSE) minimisation is the objective. It is found that greatest bias reduction can be achieved using just two sub-samples, while minimisation of RMSE, which, it should be stressed, is not the objective of the jackknife estimator, requires a larger number of sub-samples, which increases with sample size. Section 5 contains some concluding comments, and all proofs are contained in the Appendix A.
The following notation will be used throughout the paper. The symbol d = denotes equality in distribution; d → denotes convergence in distribution; p → denotes convergence in probability; ⇒ denotes weak convergence of the relevant probability measures; W(r) denotes a Wiener process on C[0, 1], the space of continuous real-valued functions on the unit interval; and J c (r) = r 0 e (r−s)c dW(s) denotes the Ornstein-Uhlenbeck process, which satisfies dJ c (r) = cJ c (r)dr + dW(r) for some constant parameter c. Functionals of W(r) and J c (r), such as 1 0 J c (r) 2 dr, are denoted 1 0 J 2 c for notational convenience where appropriate, and in stochastic integrals of the form e cr J c , it is to be understood that integration is carried out with respect to r. Finally, L denotes the lag operator such that L j y t = y t−j for a random variable y t .

The Model and the Standard Jackknife Estimator
The model with a near-unit root is defined as follows.
Assumption 1. The sequence y 1 , . . . , y n satisfies: where ρ = e c/n = 1 + c/n + O(n −2 ) for some constant c, y 0 is an observable O p (1) random variable and u t is the stationary linear process: where t ∼ iid(0, σ 2 ), E( 4 t ) < ∞, δ(z) = ∑ ∞ j=0 δ j z j , δ 0 = 1 and ∑ ∞ j=0 j|δ j | < ∞. The parameter c controls the extent to which the near-unit root deviates from unity; when c < 0, the process is (locally) stationary, whereas it is (locally) explosive when c > 0. Strictly speaking, the autoregressive parameter should be denoted ρ n to emphasise its dependence on the sample size, n, but we use ρ for notational convenience. The linear process specification for the innovations is consistent with u t being a stationary ARMA(p, q) process of the form φ(L) θ j z j , and all roots of the equation φ(z) = 0 lie outside the unit circle. In this case, δ(z) = θ(z)/φ(z), but Assumption 1 also allows for more general forms of linear processes and is not restricted solely to the ARMA class. Under Assumption 1, u t satisfies the functional central limit theorem: on C[0, 1], where σ 2 = σ 2 δ(1) 2 denotes the long-run variance. Equations of the form (1) have been used extensively in the literature on testing for an autoregressive unit root (corresponding to c = 0) and for examining the power properties of the resulting tests (by allowing c to deviate from zero). In economic and financial time series, they offer a flexible mechanism of modelling highly persistent series whose autoregressive roots are generally close, but not exactly equal, to unity. Ordinary least squares (OLS) regression on (1) yields: whereû t denotes the regression residual, and it can be shown see Phillips (1987a) thatρ satisfies: where η = σ 2 u /σ 2 , σ 2 u = E(u 2 t ) = σ 2 ∑ ∞ j=0 δ 2 j and the functional Z c (η) is implicitly defined. The limit distribution in (5) is skewed, and the estimator suffers from significant negative bias in finite samples; see Perron (1989) for properties of the limit distribution for the case where σ 2 = σ 2 u and (hence) η = 1. The jackknife estimator offers a computationally simple method of bias reduction by combining the full-sample estimator,ρ, with a set of m sub-sample estimators,ρ j (j = 1, . . . , m), the weights assigned to these components depending on the type of sub-sampling method employed. Phillips and Yu (2005) find the use of non-overlapping sub-samples to perform well in reducing bias in the estimation of stationary diffusions, while the analysis of Chambers (2013) supports this result in the setting of stationary autoregressions. In this approach, the full sample of n observations is divided into m sub-samples, each of length , so that n = m × . The generic form of jackknife estimator is given by: where the weights are determined so as to eliminate the first-order finite sample bias. Assuming that the full-sample estimator and each sub-sample estimator satisfy a (Nagar-type) bias expansion of the form: it can be shown that the appropriate weights are given by w 1 = m/(m − 1) and w 2 = −1/(m − 1), in which case: using the fact that m/n = 1/ . Under such circumstances, the jackknife estimator is capable of completely eliminating the O(1/n) bias term in the estimator as compared toρ. However, in the pure unit root setting (c = 0), Chambers and Kyriacou (2013) demonstrated that the sub-sample estimators do not share the same limit distribution as the full-sample estimator, which means that the expansions for the bias of the sub-sample estimators are incorrect, and hence, the weights defined above do not eliminate fully the first-order bias. It is therefore important to investigate this issue in the more general setting of a near-unit root with a view toward deriving the appropriate weights for eliminating the first-order bias term.

Sub-Sample Properties
In order to explore the sub-sample properties, let: denote the set of integers indexing the observations in each sub-sample. The sub-sample estimators can then be written, in view of (5), as: Theorem 1 (below) determines the limiting properties of the quantities appearing in (7), as well as the limit distribution of (ρ j − ρ) itself.
The limit distribution in Part (c) of Theorem 1 is of the same form as that of the full-sample estimator in (5), except that the integrals are over the subset [(j − 1)/m, j/m] of [0, 1] rather than the unit interval itself. Note, too, that the first component of the numerator of Z c,j (η) also has the representation: The fact that the distributions Z c,j (η) in Theorem 1 depend on j implies that the expansions for E(ρ j − ρ) that are used to derive the jackknife weights defined following (6) may not be correct under a near-unit root. When the process (1) has a near-unit root, we can expect the expansions for E(ρ j − ρ) to be of the form: indeed, we later justify this expansion and characterise µ c,j precisely. Such expansions have been shown to hold in the unit root (c = 0) case, as well as more generally when c = 0. For example, Phillips (1987b, Theorem 7.1) considered the Gaussian random walk (corresponding to (1) with c = 0, δ(z) = 1, y 0 = 0 and u t Gaussian) and demonstrated the validity of an asymptotic expansion for the normalised coefficient estimator; it is given by: where ξ is a standard normal random variable distributed independently of W. Taking expectations in (9), using the independence of ξ and W and noting that the expected value of the leading term is −1.7814 (see, for example, Table 7.1 of Tanaka 1996), the bias satisfies: see, also, Phillips (2012Phillips ( , 2014. In the more general setting of the model in Assumption 1, and assuming that u t is Gaussian, Theorem 1 of Perron (1996) established that: where v 2 f = 2π f u 2 (0) and f u 2 (0) denotes the spectral density of u 2 t − σ 2 u at the origin. The following result extends the type of expansion in (11) to the sub-sample estimators.
Theorem 2. Let y 1 , . . . , y n satisfy Assumption 1 and, in addition, assume that u t is Gaussian. Then, for j = 1, . . . , m, where ξ ij ∼ N(0, s 2 j ), ξ 2j ∼ N(0, 1) and: The form of the expansion for (ρ j − ρ) in Theorem 2 is similar to that for the full-sample estimator, but also depends on m and j. Use of these expansions to derive expressions for the biases ofρ andρ j would be complicated due to the dependence on y 0 . We therefore take y 0 = 0 1 , which results in the following expectations: If y 0 = 0, then additional random variables appear in the numerator and denominator of the bias, thereby complicating the derivation of the required expectation. In the case of c = 0, the simulation results reported in Table 2.3 of Kyriacou (2011) indicate that the bias ofρ increases with y 0 /σ, but that the jackknife continues to be an effective method of bias reduction.
these results utilising the independence of the normally distributed random variables (ξ 1j and ξ 2j ) and the Wiener process W. The next section provides the form of the moment generating function that enables expectations of the functionals Z c (η) and Z c,j (η) to be computed, thereby enabling the construction of a jackknife estimator that eliminates the first-order bias.

A Moment Generating Function and Its Properties
The following result provides the joint moment generating function (MGF) of two relevant functionals of J c defined over a subinterval [a, b] Although our focus is on sub-intervals of [0, 1], we leave b unconstrained for greater generality than is required for our specific purposes because the results may have more widespread use beyond our particular application.
The joint MGF of N c and D c is given by: The individual MGFs for N c and D c are given by, respectively, where λ 1 = (c 2 + 2cθ 1 ) 1/2 and λ 2 = (c 2 − 2θ 2 ) 1/2 . (c) Let: Then, the expectation of N c /D c is given by: where: The MGFs for the two functionals in Theorem 3 have potential applications in a wide range of sub-sampling problems with near-unit root processes. A potential application of the joint MGF in Part (a) of Theorem 3 is in the computation of the cumulative and probability density functions of the distributions Z c,j (η) when setting a = (j − 1)/m and b = j/m. For example, the probability density function of mZ c,j (1) is given by (with i 2 = −1): see, for example, Perron (1991, p. 221), who performs this type of calculation for the distribution Z c , while Abadir (1993) derives a representation for the density function of Z c in terms of a parabolic cylinder function. The result in Part (b) of Theorem 3 is obtained by differentiating the MGF and constructing the appropriate integrals. When c = 0, the usual (full-sample) result, where a = 0 and b = 1, can be obtained as a special case. Noting that v 2 = 0 in this case and making the substitution w = (c 2 + 2θ 2 ) 1/2 results in: these expressions can be found; for example, Gonzalo and Pitarakis (1998, Lemma 3.1). Some further special cases of interest that follow from Theorem 3 are presented below.
, while taking the limit as c → 0 yields: Taking the limit as c → 0 results in: The results in Part (a) of the corollary are relevant in the full-sample case, and the result for M 0 (θ 1 , θ 2 ) goes back to White (1958). The results in Part (b) of the corollary are pertinent to the sub-sampling issues being investigated here in the case of a near-unit root, with the unit root (c = 0) result for M 0 (θ 1 , θ 2 ) having been first derived by Chambers and Kyriacou (2013).
It is also possible to use the above results to explore the relationship between the sub-sample distributions and the full-sample distribution. For example, it is possible to show that M N c/m (θ 1 /m) on [0, 1] is equal to M N c (θ 1 ) for j = 1 in the sub-samples, while M D c/m (θ 2 /m 2 ) on [0, 1] is equal to M D c (θ 2 ) for j = 1 in the sub-samples; an implication of this is that: Furthermore, this implies that the limit distribution of the first sub-sample estimator, (ρ 1 − ρ), when ρ = e c/n = e c/m , is the same as that of the full-sample estimator, n(ρ − ρ), when ρ = e c/mn .
The sub-sample results with a near-unit root can be related to the full-sample results of Phillips (1987a). For example, the MGF in Theorem 3 has the equivalent representation: where λ and v 2 are defined in the theorem, z = (b − a)λ and δ = θ 1 + c − λ. When a = 0, b = 1, it follows that v 2 = 0, and the above expression nests the MGF in Phillips (1987a), i.e., this follows straightforwardly from (14). It is also of interest to examine what happens when the local-to-unity parameter c → −∞, as in Phillips (1987a) and other recent work on autoregression, e.g., Phillips (2012). We present the results in Theorem 4 below.
The functional K(c) in Theorem 4 represents the limit distribution of the normalised estimator g(c) 1/2 (ρ a,b − ρ), where denotes the number of observations in the sub-sample [bn − + 1, bn] (so that a = b − (1/m) in this case) andρ a,b is the corresponding estimator. However, as pointed out by Phillips (1987a), the sequential limits (large sample for fixed c, followed by c → −∞) are only indicative of the results one might expect in the stationary case and do not constitute a rigorous demonstration. The results in Theorem 4 also encompass the related results in Phillips (1987a) obtained when a = 0 and b = 1.

An Optimal Jackknife Estimator
The discussion following Theorem 2 indicates that the weights defining an optimal jackknife estimator, which removes first-order bias in the local-to-unity setting, depend on the quantities: where λ 1 (η) = (1 − η)/2, λ 2 (η, m) = λ 1 (η)/m 2 and: In particular, Part (c) of Theorem 3 can be used to evaluate the quantities: where we have defined: The relevant MGFs for evaluating µ c (1) and µ c,j (1) are given in the corollary to Theorem 3. Table 1 (1)) in view of the distributional equivalence of Z c (1) and Z c,1 (1) discussed following the corollary. For a given combination of j and m, it can be seen that the expectations increase as c increases, while for given c and j the expectations increase with m. A simple explanation for the different properties of the sub-samples beyond j = 1 is that the initial values are of the same order of magnitude as the partial sums of the innovations. The values of the sub-sample expectations when c = 0 are seen from Table 1 to be independent of m and to increase with j. Note that µ 0,1 (1) = −1.7814 corresponds to the expected value of the limit distribution of the full-sample estimatorρ under a unit root; see, for example, (10) and the associated commentary. The values of µ 0,j (η) can be used to define jackknife weights under a unit root for different values of m; see, for example, Chambers and Kyriacou (2013). More generally, the values of µ c,j (η) can be used to define optimal weights for the jackknife estimator that achieve the aim of first-order bias removal in the presence of a near-unit root. The result is presented in Theorem 5.
Theorem 5 shows the form of the optimal weights for the jackknife estimator when the process (1) has a near-unit root. It can be seen that the weights depend not only on the value of c, but also on the value of η, both of which are unknown in practice. The authors in Chambers and Kyriacou (2013) and Chen and Yu (2015) have emphasised the case c = 0 and η = 1 and have reported simulation results highlighting the good bias-reduction properties of appropriate jackknife estimators in that case. When c = 0 and η = 1, the optimal weights in Theorem 5 simplify to: The values of µ c,j (1) in Table 1 can be utilised to derive these optimal weights for the jackknife estimator in this case; these are reported in Table 2 for the values of m and c used in Table 1, along with the values of the standard weights that are applicable in stationary autoregressions. The entries in Table 1 show that the optimal weights are larger in (absolute) value than the standard weights that would apply if all the sub-sample distributions were the same and that they increase with c for given m. The optimal weights also converge towards the standard weights as c becomes more negative; this could presumably be demonstrated analytically using the properties of the MGF in constructing 2 The integrals were computed numerically using an adaptive quadrature method in the integrate1d routine in Gauss 17. the µ c,j (1) by examining the appropriate limits as c → −∞, although we do not pursue such an investigation here.
The relationship between the optimal weights when η = 1 and when η = 1 is not straightforward. Noting that:μ and that: we find that: This expression can be manipulated to write w * 1,c (η) explicitly in terms of w * 1,c (1) as follows: The second weight is obtained simply as w * 2,c (η) = 1 − w * 1,c (η). In situations where η = 1, which essentially reflects cases where u t does not have a white noise structure, the optimal weights can be obtained from the entries in Table 2 (at least for the relevant values of c) using (15), but knowledge is still required not only of c and η, but also the expectations of the inverses of D c and D c,j . The latter can be computed numerically; Equation (2.3) of Meng (2005) shows that: where M D c (θ 2 ) and M D c,j (θ 2 ) are the MGFs of D c and D c,j , respectively, which can be obtained from the corollary to Theorem 3. In practice, however, the values of c and η are still required in order to construct the optimal estimator. Although the localization parameter c from the model defined under Assumption 1 is identifiable, it is not possible to estimate it consistently, and attempts to do so require a completely different formulation of the model; see, for example, Phillips et al. (2001), who propose a block local-to-unity framework to consistently estimate c, although this approach does not appear to have been pursued subsequently. Furthermore η depends on an estimator of the long-run variance σ 2 , which is a notoriously difficult quantity to estimate in finite samples. In view of these unresolved challenges and following earlier work on jackknife estimation of autoregressive models with a (near-)unit root, we focus on the case η = 1, but allow c = 0 with particular attention paid to unit root and locally-stationary processes, i.e., c ≤ 0. Our simulations examine the performance of five estimators 3 of the parameter ρ = e c/n . The baseline estimator is the OLS estimator in (4), the bias of which the jackknife estimators aim to reduce. Three jackknife estimators with the generic form: are also considered, each differing in the choice of weights w 1 ; in all cases, w 2 = 1 − w 1 . The standard jackknife sets w 1 = m/(m − 1); the optimal jackknife sets w 1 = w * 1,c ; and the unit root jackknife sets w 1 = w * 1,0 . The standard jackknife removes fully the first-order bias in stationary autoregressions, but does not do so in the near-unit root framework, in which the optimal estimator achieves this goal. However, the optimal estimator is infeasible because it relies on the unknown parameter c. 4 3 The Gauss codes used for the jackknife estimators are available from the authors on request. 4 In the simulations, we are taking η = 1 as known.
We therefore also consider the feasible unit root jackknife obtained by setting c = 0. In addition, we consider the jackknife estimator of Chen and Yu (2015), which is of the form: The weights are chosen so as to minimise the variance of the estimator in addition to providing bias reduction in the case c = 0. Because the choice of weights is a more complex problem for this type of jackknife estimator, Chen and Yu only provide results for the cases m = 2 and m = 3, in which case the weights are w 1 = 2.8390, w 2,1 = 0.6771, w 2,2 = 1.1619 and w 1 = 2.0260, w 2,1 = 0.2087, w 2,2 = 0.3376, w 2,3 = 0.4797, respectively; see Table 1 of Chen and Yu (2015). Table 3 reports the bias of the five estimators obtained from 100,000 replications of the model in Assumption 1 with u t ∼ iid N(0, 1) and y 0 = 0 using m = 2 for each of the jackknife estimators; this value has been found to provide particularly good bias reduction in a number of studies, including Phillips and Yu (2005), Chambers (2013), Chambers and Kyriacou (2013) and Chen and Yu (2015). The particular values of c are c = {−10, −5, −1, 0}, which focus on the pure unit root case, as well as locally stationary processes, and four sample sizes are considered, these being n =24, 48, 96 and 192 The value of the bias of the estimator producing the minimum (absolute) bias for each c and n is highlighted in bold in Table 3. The results show the substantial reduction in bias that can be achieved with jackknife estimators, the superiority of the optimal estimator being apparent as c becomes more negative, although the unit root jackknife also performs well in terms of bias reduction. Table 4 contains the corresponding RMSE values for the jackknife estimators using m = 2, as well as the RMSE corresponding to the RMSE-minimising values of m, which are typically larger than m = 2 and are also reported in the table. The RMSE value of the estimator producing the minimum RMSE for each c and n is highlighted in bold. In fact, the optimal jackknife estimator, although constructed to eliminate first order bias, manages to reduce the OLS estimator's RMSE and outperforms the Chen and Yu (2015) jackknife estimator in both bias and RMSE reduction, although the latter occurs at a larger number of sub-samples. The results show that use of larger values of m tends to produce smaller RMSE than when m = 2, and again, the optimal jackknife performs particularly well when c becomes more negative. The performance of the unit root jackknife is also impressive, suggesting that it is a feasible alternative to the optimal estimator when the value of c is unknown.
Although in itself important, bias is not the only feature of a distribution that is of interest, and hence, the RMSE values in Table 4 should also be taken into account when assessing the performance of the estimators. The substantial bias reductions obtained with the bias-minimising value of m = 2 come at the cost of a larger variance that ultimately feeds through into a larger RMSE compared with the OLS estimatorρ. This can be offset, however, by using the larger RMSE-minimising values of m that, despite having a larger bias than when m = 2, are nevertheless able to reduce the variance sufficiently to result in a smaller RMSE thanρ.  In order to assess the robustness of the jackknife estimators, some additional bias results are presented in Table 5 that correspond to values of η < 1, while the estimators are based on the assumption that η = 1, as in the preceding simulations. 5 The results correspond to two different specifications for u t that enable data to be generated that are consistent with different values of η. The first specifies u t to be a first-order moving average (MA(1)) process, so that u t = t + θ t−1 where t ∼ iid N(0, 1); in this case η = (1 + θ 2 )/(1 + θ) 2 . The second specification is a first-order autoregressive (AR(1)) process of the form u t = φu t−1 + t , in which case η = (1 − φ) 2 /(1 − φ 2 ). In the MA(1) case, we have chosen θ = 0.5 in order to give an intermediate value of η = 0.5556, while in the AR(1) case, we have chosen φ = 0.9 to give a small value of η = 0.0526. As in Table 3, the value of the bias of the estimator producing the minimum (absolute) bias for each c and n is highlighted in bold. Table 5 shows, in the MA case, that the jackknife estimators are able to reduce bias when c = 0, but none of them is able to do so when c = −1 or c = −5. In the AR case, with a smaller value of η, the jackknife estimators are still able to deliver bias reduction, albeit to a lesser extent than when η = 1, and it is the unit root jackknife of Chambers and Kyriacou (2013) that achieves the greatest bias reduction in this case. These results are indicative of the importance of knowing η and suggest that developing methods to allow for η = 1 is important from an empirical viewpoint.

Conclusions
This paper has analysed the specification and performance of jackknife estimators of the autoregressive coefficient in a model with a near-unit root. The limit distributions of sub-sample estimators that are used in the construction of the jackknife estimator are derived, and the joint MGF of two components of that distribution is obtained and its properties explored. The MGF can then be used to derive the weights for an optimal jackknife estimator that removes fully the first-order finite sample bias from the OLS estimator. The resulting jackknife estimator is shown to perform well at finite sample bias reduction in a simulation study and, with a suitable choice of the number of sub-samples, is shown to be able to reduce the overall finite sample RMSE, as well.
The theoretical findings in Sections 3 and 4 show how first-order approximations on sub-sample estimators can be used along with the well-known full-sample results of Phillips (1987a) for finite-sample refinements. The jackknife uses analytical (rather than simulation-based) results to achieve bias reduction at minimal computational cost along the same lines as indirect inference methods based on analytical approximations in Phillips (2012) and Kyriacou et al. (2017). Apart from computational simplicity, an evident advantage of analytical-based methods over simulation-based alternatives such as bootstrap or (traditional, simulation-based) indirect inference methods is that they require no distributional assumptions on the error term.
Despite its success in achieving substantial bias reduction in finite samples, as shown in the simulations, a shortcoming of the jackknife estimator, and an impediment to its use in practice, is the dependence of the optimal weights on the unknown near-unit root parameter, as well as on a quantity related to the long-run variance of the disturbances. 6 However, our theoretical results in Sections 3 and 4 reveal precisely how these quantities affect the optimal weights and therefore can, in principle, be used to guide further research into the development of a feasible data-driven version of the jackknife within this framework. Such further work is potentially useful in view of the simulations in Tables 3 and 4 highlighting that (feasible) jackknife estimators are an effective bias and RMSE reduction tool in a local unit root setting, even if they do not fully remove first order bias. Moreover, the results obtained in Theorems 1-4 can be utilised in a wide range of sub-sampling situations outside that of jackknife estimation itself.
The results in this paper could be utilised and extended in a number of directions. An obvious application would be in the use of jackknife estimators as the basis for developing unit root test statistics, the local-to-unity framework being particularly well suited to the analysis of the power functions of such tests. It would also be possible to develop, fully, a variance-minimising jackknife estimator along the lines of Chen and Yu (2015) who derived analytic results for c = 0 and m = 2 or 3, although extending their approach to arbitrary c and m represents a challenging task. However, considerable progress has been made in this direction by Stoykov (2017), who builds upon our results and also proposes a two-step jackknife estimator that incorporates an estimate of c to determine the jackknife weights. The estimation model could also be extended to include an intercept and/or a time trend. The presence of an intercept will affect the limit distributions by replacing the Ornstein-Uhlenbeck processes by demeaned versions thereof, which will also have an effect on the finite sample biases. Such effects have been investigated by Stoykov (2017), who shows that substantial reductions in bias can still be achieved by jackknife methods. Applications of jackknife methods in multivariate time series settings are also possible, a recent example being Chambers (2015) in the case of a cointegrated system, but other multivariate possibilities could be envisaged.
The normalised partial sums of u t , S t = ∑ t j=1 u j , are also important, as is the functional: Under the conditions on u t , it follows that X n (r) ⇒ σW(r) as n → ∞. Taking each part in turn: (a) In view of (A1) and (A2), the object of interest can be written: e c(t−j)/n u j + e ct/n y 0 where, in the penultimate line, we note that j /n = j/m and (j − 1) /n = (j − 1)/m to give the limits of the outer integral.
(b) Squaring the difference equation for y t , summing over t ∈ τ j and noting that e 2c/n = 1 + (2c/n) + O(n −2 ), we obtain: Solving for the quantity of interest yields: noting that j = (j/m)n and (j − 1) = ((j − 1)/m)n. It follows that: Using the Itô calculus (see, for example, Tanaka (1996, p. 58), we obtain the following stochastic differential equation for J c (t) 2 : substituting dJ c (t) = cJ c (t)dt + dW(t) then yields: Integrating the above over [(j − 1)/m, j/m], we find that: and hence, we obtain: as required.
(c) The result follows immediately from Parts (a) and (b) in view of (7).
Proof of Theorem 2. Proceeding as in the proof of Theorem 1, but retaining higher order terms, we find that: Next, as before, we have: a similar result holds for (y (j−1) / √ ) 2 . Furthermore, where ξ 2j ∼ N(0, 1) (j = 1, . . . , m). Combining with the result for (1/ 2 ) ∑ t∈τ j y 2 t , we find that: where ξ 1j (j = 1, . . . , m) is defined by: The stated distribution of ξ 1j then follows using the property that: EJ c (r)J c (s) = e c(r+s) − e c(max(r,s)−min(r,s)) 2c to calculate the variances and covariances; see Perron (1991, p. 234). In particular: Hence, defining δ = θ 1 + c − λ, As the parameter λ is arbitrary, it is convenient to set λ = (c 2 + 2cθ 1 − 2θ 2 ) 1/2 so as to eliminate the term b a Y 2 . We shall then proceed in two steps: (i) Take the expectation in M c (θ 1 , θ 2 ) conditional on F a 0 , the sigma field generated by W on [0, a]. (ii) Introduce another O-U process V and apply Girsanov's theorem again to take the expectation with respect to F a 0 .