Fixed-b Inference for Testing Structural Change in a Time Series Regression

This paper addresses tests for structural change in a weakly dependent time series regression. The cases of full structural change and partial structural change are considered. Heteroskedasticity-autocorrelation (HAC) robust Wald tests based on nonparametric covariance matrix estimators are explored. Fixed-b theory is developed for the HAC estimators which allows fixed-b approximations for the test statistics. For the case of the break date being known, the fixed-b limits of the statistics depend on the break fraction and the bandwidth tuning parameter as well as on the kernel. When the break date is unknown, supremum, mean and exponential Wald statistics are commonly used for testing the presence of the structural break. Fixed-b limits of these statistics are obtained and critical values are tabulated. A simulation study compares the finite sample properties of existing tests and proposed tests.


Introduction
This paper focuses on fixed-b inference of heteroskedasticity and autocorrelation (HAC) robust Wald statistics for testing for a structural break in a time series regression.We focus on kernel-based nonparametric HAC estimators which are commonly used to estimate the asymptotic variance.HAC estimators allow for arbitrary structure of the serial correlation and heteroskedasticity of weakly dependent time series and are consistent estimators of the long run variance under the assumption that the bandwidth (M) is growing at a certain rate slower than the sample size (T).Under consistency assumptions, the Wald statistics converge to the usual chi-square distributions.However, because the critical values from the chi-square distribution are based on a consistency approximation for the HAC estimator, the chi-square limit does not reflect the often substantial finite sample randomness of the HAC estimator.Furthermore, the chi-square approximation does not capture the impact of the choice of the kernel or the bandwidth on the Wald statistics.The sensitivity of the statistics to the finite sample bias and variability of the HAC estimator is well known in the literature; Kiefer and Vogelsang (2005) [1] among others have illustrated by simulation that the traditional inference with a HAC estimator can have poor finite sample properties.
Departing from the traditional approach, Kiefer and Vogelsang [1][2][3] obtain an alternative asymptotic approximation by assuming that the ratio of the bandwidth to the sample size, b = M/T, is held constant as the sample size increases.Under this alternative nesting of the bandwidth, they obtain pivotal asymptotic distributions for the test statistics which depend on the choice of kernel and bandwidth tuning parameter.Simulation results indicate that the resulting fixed-b approximation has less size distortions in finite samples than the traditional approach, especially when the bandwidth is not small.
Theoretical explanations for the finite sample properties of the fixed-b approach include the studies by Hashimzade and Vogelsang (2008) [4], Jansson (2004) [5], Sun, Phillips and Jin (2008, hereafter SPJ) [6], Gonçalves and Vogelsang (2011) [7] and Sun (2013) [8].Hashimzade and Vogelsang (2008) [4] provides an explanation for the better performance of the fixed-b asymptotics by analyzing the bias and variance of the HAC estimator.Gonçalves and Vogelsang (2011) [7] provides a theoretical treatment of the asymptotic equivalence between the naive bootstrap distribution and the fixed-b limit.Higher order theory is used by Jansson (2004) [5], SPJ (2008) [6] and Sun (2013) [8] to show that the error in rejection probability using the fixed-b approximation is more accurate than the traditional approximation.In a Gaussian location model, Jansson (2004) [5] proves that for the Bartlett kernel with bandwidth equal to sample size (i.e., b = 1), the error in rejection probability of fixed-b inference is O(T −1 log T) which is smaller than the usual rate of O(T −1/2 ).The results in SPJ (2008) [6] complement Jansson's result by extending the analysis for a larger class of kernels and focusing on smaller values of bandwidth ratio b.In particular, they find that the error in rejection probability of the fixed-b approximation is O(T −1 ) around b = 0.They also show that for positively autocorrelated series, which is typical for economic time series, the fixed-b approximation has smaller error than the chi-square or standard normal approximation, even when b is assumed to decrease to zero although the stochastic orders are same.
In this paper, fixed-b asymptotics is applied to testing for structural change in a weakly dependent time series regression.The structural change literature is now enormous and no attempt will be made here to summarize the relevant literature.Some key references include Andrews (1993) [9], Andrews and Ploberger (1994) [10], and Bai and Perron (1998) [11].Andrews (1993) [9] treats the issue of testing for a structural break in the generalized method of moments framework when the one-time break date is unknown and Andrews and Ploberger (1994) [10] derive asymptotically optimal tests.Bai and Perron (1998) [11] considers multiple structural change occurring at unknown dates and covers the issues of estimation of break dates, testing for the presence of structural change and testing for the number of breaks.For a comprehensive survey of the recent structural break literature see Perron (2006) [12], Banerjee and Urga (2005) [13], and Aue and Horváth (2013) [14].The fixed-b analysis can be extended to the case of multiple breaks but the simulation of critical values will be computationally intensive.Therefore, we leave the case of multiple breaks for future research and we consider the case of a single break in this paper.
For testing the presence of break, the robust version of the Wald statistic is considered in this paper and a HAC estimator is used to construct the test statistic.The ways of constructing HAC estimators in the context of structural change tests are well described in Bai and Perron (2003) [15] and Bai and Perron (1998) [11].We focus mainly on the HAC estimator documented in Bai and Perron (2003) (Section 4.1, [15]) in which the usual "Newey-West-Andrews" approach is applied directly to the regression with regime dummies.Under the assumption of a fixed bandwidth ratio (fixed-b assumption), the asymptotic limit of the test statistic is a nonstandard distribution but it is pivotal.As in standard fixed-b theory, the impact of choice of bandwidth on the limiting distribution is substantial.In particular, the bandwidth interplays with the hypothesized break fraction so that the limit of the test statistic depends on both of them.For the unknown break date case, three existing test statistics (Sup-, Mean-, Exp-Wald) are considered and their fixed-b critical values are tabulated.The finite sample performance is examined by simulation experiments with comparisons made to existing tests.For practitioners, we include results using a data-dependent bandwidth rule based on Andrews (1991) [16].This data-dependent bandwidth is calculated from the regression using the break fraction that yields the minimum sum of squared residuals (Bai and Perron, 1998 [11]).One can calculate a bandwidth ratio b * = M * T with this data-dependent bandwidth (M * ) and proceed to apply the fixed-b critical values corresponding to this specific value of b * .The remainder of this paper is organized as follows.In Section 2, the basic setup of the full/partial structural-change model is presented and preliminary results are provided.Section 3 derives the fixed-b limit of the Wald statistic and the fixed-b critical values, for the case of unknown break dates, are tabulated in Section 4. Section 5 compares empirical null rejection probabilities and provides the size-adjusted power for tests based on the b * data-dependent bandwidth ratio.Section 6 concludes.Proofs and definitions are collected in Appendix A.

Setup and Preliminary Results
Consider a weakly dependent time series regression model with a structural break given by where x t is p × 1 regressor vector, λ ∈ (0, 1) is a break point, and 1( • ) is the indicator function.Define ν t = x t u t and v t = w t u t .Recalling that [x] denotes the integer part of a real number, x, notice that x 2t = 0 for t = 1, 2, ..., [λT] and x 1t = 0 for t = [λT] + 1, ..., T. For the time being, the potential break point (fraction) λ is assumed to be known in order to develop the asymptotic theory for a test statistic and characterize its asymptotic limit.We will relax this assumption to deal with the empirically relevant case of an unknown break date.The regression model (1) implies that coefficients of all explanatory variables are subject to potential structural change and this model is labeled the 'full' structural change model.
We are interested in testing the presence of a structural change in the regression parameters.Consider the null hypothesis of the form where and R 1 is an l × p matrix with l ≤ p.Under the null hypothesis, we are testing that one or more linear relationships on the regression parameter(s) do not experience structural change before and after the break point.Tests of the null hypothesis of no structural change about a subset of the slope parameters are special cases.For example, we can test the null hypothesis that the slope parameter on the first regressor did not change by setting R 1 = (1, 0, . . ., 0).We can test the null hypothesis that none of the regression parameters have structural change by setting R 1 = I p .We focus on the OLS estimator of β ∑ T t=1 w t y t .In order to establish the asymptotic limits of the HAC estimators and the Wald statistics, two assumptions are sufficient.These assumptions imply that there is no heterogeneity in the regressors across the segments and the covariance structure of the errors is assumed to be the same across segments as well.
For later use, we define a l × l nonsingular matrix , where W l (r) is l × 1 standard Wiener process.For a more detailed discussion about the regularity conditions under which Assumptions 1 and 2 hold, refer to Kiefer and Vogelsang (2002) [3] and see Davidson (1994) [17], Phillips and Durlauf (1986) [18], Phillips and Solo (1992) [19], and Wooldridge and White (1988) [20] for more details.
The matrix Q is the second moment matrix of x t and is typically estimated using the quantity Q = 1 T ∑ T t=1 x t x t .The matrix Σ ≡ ΛΛ is the asymptotic variance of T −1/2 ∑ T t=1 ν t , which is, for a covariance stationary series, given by Consider the non-structural change regression equation where β 1 = β 2 and this coefficient parameter is estimated by OLS ( β).In this particular setup, the long run variance, Σ, is commonly estimated by the kernel-based nonparametric HAC estimator given by where Under some regularity conditions (see Andrews (1991) [16], DeJong and Davidson (2000) [21], Hansen (1992) [22], Jansson (2002) [23] or Newey and West (1987) [24]), Σ is a consistent estimator of Σ, i.e., Σ p → Σ.These regularity conditions include the necessary condition that M/T → 0 as M, T → ∞.This asymptotics is called "traditional" asymptotics throughout this paper.
In contrast to the traditional approach, fixed-b asymptotics assumes M = bT where b is held constant as T increases.Assumptions 1 and 2 are the only regularity conditions required to obtain a fixed-b limit for Σ.Under the fixed-b approach, for b ∈ (0, 1], Kiefer and Vogelsang (2005) [1] show that where W p (r) = W p (r) − rW p (1) is a p-vector of standard Brownian bridges and the form of the random matrix P(b, W p ) depends on the kernel.Following Kiefer and Vogelsang (2005) [1], we consider three classes of kernels which give three forms of P. Let H p (r) denote a generic vector of stochastic processes.H p (r) denotes its transpose.P(b, H p ) is defined in Appendix A.
Getting back to our structural change regression model, fixed-b results depend on the limiting behavior of the following partial sum process given by Under Assumptions 1 and 2, the limiting behavior of β and the partial sum process S t are given as follows.
Proposition 1.Let λ ∈ (0, 1) be given.Suppose the data generation process is given by (1) and let [rT] denote the integer part of rT where r ∈ [0, 1].Then, under Assumptions 1 and 2 as T → ∞, , where F (1) and See Appendix A for the proof.
It is easily seen that the asymptotic distributions of β 1 and β 2 are Gaussian and are independent of each other.Hence the asymptotic covariance of β 1 and β 2 is zero.The asymptotic variance of where In order to test the null hypothesis (2), HAC robust Wald statistics are considered.These statistics are robust to heteroskedasticity and autocorrelation in the vector process, ν t = x t u t .The generic form of the robust Wald statistic is given by where and Ω is a HAC robust estimator of Ω.We consider a particular way of constructing the HAC estimator.This estimator is the same one as in Bai and Perron (2003) [15].Denoted by Ω (F) , it is constructed using the residuals directly from the dummy regression (1): where v t = w t u t = x 1t u t , x 2t u t 2p×1 .We denote the components of v t as v ) is the variance estimator one would be using if the usual "Newey-West-Andrews" approach is applied directly to the dummy regression (1).
t ) we can write Ω (F) as Three important observations are in order.First, the main component of the two diagonal blocks are within regime HAC estimators of Σ , the long run variance of {ν t } .However, one should see that the "effective" bandwidth ratio being applied to Σ (1) As documented in fixed-b literature (e.g., Kiefer and Vogelsang (2005) [1]), the bias in HAC estimators not accounted by traditional inference increases as the bandwidth ratio gets bigger.So, when the HAC estimator is constructed as in (8), traditional inference might be often exposed to size distortion-more than expected-due to this mechanism of determining effective bandwidths.The second issue is that the above estimator has non-zero off-diagonal blocks.So, the methodology based on partial samples such as in Andrews (1993) [9] does not exactly cover this case because the off-diagonal blocks in Andrews (1993) [9] are assumed to be zero, matching the zero asymptotic covariance of the OLS estimators of the slope coefficients between pre-and post-regimes.It is presumable that the influence of having non-zero off diagonal terms might be small since the off-diagonal blocks converge to zero under the traditional assumption M T → 0 as sample size grows (see a proof in Cho (2014) [25] for the Bartlett kernel) but it might still negatively affect the performance of tests in finite samples and we need to develop an alternative asymptotic theory to explicitly reflect the presence of these components.Third, there is another issue when a researcher uses a data-dependent bandwidth as in Andrews (1991) [16].For a given hypothesized break fraction, a data-dependent bandwidth can be calculated based on the pooled series of v . This method would result in an optimal bandwidth which minimizes the MSE in estimating Σ but the presence of non-zero off-diagonal terms are not taken into account in this procedure.Moreover, when the break date is treated as unknown, a sequence of data-dependent bandwidth across potential break dates will be generated.In this case, the fixed-b limits are not useful approximations because the sequence of the data-dependent bandwidth is random by nature so the limiting distributions of corresponding test statistics cannot be characterized by a single particular value of b.
Denote by Wald (F) (T b ), the Wald statistic given by ( 6) using the break date T b with Ω (F) used for Ω.Tests for a potential structural break with an unknown break date are well studied in Andrews (1993) [9], Andrews and Ploberger (1994) [10], and Bai and Perron (1998) [11].Andrews (1993) [9] considers several tests based on the supremum across breakpoints of Wald and Largrange multiplier statistics and shows that they are asymptotically equivalent.Andrews and Ploberger (1994) [10] derives tests that maximize average power across potential breakpoints.
As argued by Andrews (1993) [9] and Andrews and Ploberger (1994) [10], break dates close to the end points of the sample cannot be used and so some trimming is needed.To that end, define Ξ * = [ T, T − T] with 0 < < 1 to be the set of admissible break dates.The tuning parameter, , denotes the amount of trimming of potential break dates.We consider the three statistics following Andrews (1993) [9] 1 and Andrews and Ploberger (1994) [10] 2 defined as The next section provides asymptotic results for the robust Wald statistics under the fixed-b asymptotics.

Asymptotic Results under the Fixed-b Approach
We now provide fixed-b limits for the HAC estimators and the test statistics in the full structural change model (1).The fixed-b limits presented in the next Lemma and Corollary approximate the diagonal blocks of Ω (F) by random matrices.Also, it is shown that the fixed-b approach gives a non-zero limit for the off-diagonal blocks, which further distinguishes fixed-b asymptotics from traditional asymptotics.Lemma 1.Let b ∈ (0, 1] be given and suppose M = bT.Then under Assumptions 1 and 2, as T → ∞, where and P b, F p (r, λ) is defined by (A1)-(A3) with H p (r) = F p (r, λ).
See Appendix A for the proof.Next, Corollary presents alternative representations for P b, F p (r, λ) for three classes of kernels.The definitions of these classes of kernels (Classes 1, 2 and 3) are given in Appendix A. Three popular kernels-the Quadratic Spectral, Bartlett and Parzen kernels-belong to Classes 1, 2 and 3, respectively.See Cho (2014) [25] for the proof of this Corollary.We used the critical values provided in Andrews (2003) [26] for traditional inference.The definitions for the mean and exponential statistics are slightly different in the divisor of the summation.For traditional inference, we adjusted the critical values in Andrews and Ploberger (1994) [10] to our definitions of the statistics.

Corollary 1. P
p (r, λ) P b, F where C b, F for Classes 1,2 and 3 kernels respectively.
The expression for P b, F p (r, λ) in this Corollary makes it easier to compare the fixed-b limit of Ω (F) with the standard fixed-b limit (see (3)) appearing in a non-structural change setting.Since each diagonal block of Ω (F) is basically a HAC estimator (up to a scale factor; see ( 8)) based on one of the pre-or post-break data, its limit should take the same form as (3), which is verified in this Corollary.So, each diagonal component of P b, F p (r, λ) serves to reflect the randomness and bandwidth/kernel-dependence of the associated HAC estimator.Second, unlike the traditional approach, the fixed-b limit of the off-diagonal component is non-zero.This implies that the fixed-b approach is able to take account of the covariance between β 1 and β 2 which is generally non-zero in finite samples.The limits of the Wald statistics can be derived by using Lemma 1 and the result is presented in the next Theorem.Theorem 1.Let b ∈ (0, 1] be given.Suppose M = bT.Then under Assumptions 1 and 2, as T → ∞, See Appendix A for the proof. The next Corollary provides an alternative representation for the limit given in (17).The proof for this Corollary is given in Cho (2014) [25].
Corollary 2. For a given value of λ ∈ (0, 1), the fixed-b limit of Wald (F) has the same distribution as where for Class-1 kernels, for Class-3 kernels, and W l (r) and W * l (r) are l × 1 Brownian Bridge processes which are independent of each other and of W l (1).
The limit in (18) shows how the components of Ω (F) affect the distribution of Wald (F) .
As mentioned earlier, the random matrix P b λ , W l (r) reflects the random nature of 11 which is part of the estimator of the asymptotic variance of β 1 .Notice that the effective bandwidth for Ω    ∞ (λ) is given in (17).Under the traditional assumption that the bandwidth ratio goes to zero as T grows, The asymptotic limits of Sup-, Mean-, and Exp-Wald statistics immediately follow from the continuous mapping theorem given by

Extension to the Partial Structural Change Model
This section derives the fixed-b limit of Wald (F) in the partial structural change model.The main result of this section is that the limit is the same as the limit for the full structural change model.The regression model with partial structural change is given by where x t is p × 1 and z t is q × 1 vector and The coefficients on the x t regressors are unrestricted in terms of a structural change whereas the coefficients on the z t regressors are assumed to not have structural change.Denote The parameters (α, β) are estimated by OLS and the OLS residual vector can be written as where y = (I − P Z ) y, X = (I − P Z ) X, and The residual for an individual observation is given by Also, note that The following assumptions replace Assumptions 1 and 2: , where Λ 1 is a p × (p + q) matrix, Λ 2 is a q × (p + q) matrix, and W p+q (r) is a (p + q) × 1 vector of independent Wiener process.
t=1 x t x t = rQ xx , and p lim 1 T ∑ [rT] We continue to focus on tests of the null hypothesis of no structural change in the x t slope parameters of the form Recall that the OLS estimator, β = β 1 , β 2 can be rewritten as Proposition 2. Under Assumptions 3 and 4, as , and where See Appendix A for the proof.
As seen from the above proposition, β 1 and β 2 are not asymptotically independent in the partial structural change regression model.This is true because we are projecting out the variation of explanatory variables z t so that β 1 and β 2 depend on the entire series of x t and z t .The dichotomy that β 1 is dependent only on the pre-break data and that β 2 depends only on the post-break data no longer holds in the partial structural change model.The dependence manifests in the common term, (1), in Proposition 2. However, this term cancels out in (23) when the restriction matrix takes the form of (21).As a result, and also as suggested by Equation (23), in principle we need to estimate only Λ 1 Λ 1 for testing for partial structural change.Because Ω (F) , extended for the case of partial structural change, does not impose any restrictions on the asymptotic correlation between β 1 and β 2 , Wald (F) continues to allow asymptotically pivotal fixed-b tests for partial structural change.
While not obvious at first glance, Wald (F) has the same fixed-b limit in the partial structural change case as it does in the full structural change case.The Wald statistic for testing for partial structural change is given by where Q X X = T −1 ∑ T t=1 X t X t .For constructing Wald (F) , we use the HAC estimator Ω (F) which is computed using X t u t T t=1 : where ξ t = X t u t .By the Frisch-Waugh-Lovell Theorem, this is the straightforward extension of Wald (F)  to the case of partial structural change.The next Lemma provides the limit of the scaled partial sum process of ξ t premultiplied by an appropriate term.
p+q (r, λ) − p+q (r, λ) , where F (1) See Appendix A for the proof.As Lemma 2 shows, the partial sums of the inputs to Ω (F) are asymptotically proportional to the same nuisance parameters as √ T R β − r .This is the key condition for a pivotal fixed-b limit.
The next Theorem provides the fixed-b limit of Wald (F) .
Theorem 2. Let b ∈ (0, 1] be given.Suppose M = bT.Then, under Assumptions 3 and 4, Wald (F) weakly converges to the same limit in (17), i.e., as T → ∞, See Appendix A for the proof.According to Theorem 2, the limit of Wald (F) in the partial structural change model is the same as in the full structural change model.

Critical Values
While the fixed-b limiting distributions are nonstandard, asymptotic critical values are easily obtained via simulations.We approximate the Wiener processes in the limiting distributions using scaled partial sums of 1000 i.i.d.N(0, 1) random variables.Critical values are tabulated based on 50,000 replications 3 .
In Table 1, fixed-b critical values for SupW (F) , MeanW (F) , and ExpW (F) are provided for l = 2, = 0.05, 0.1, 0.2 and for b ∈ {0.02, 0.04, 0.06, 0.08, 0.1, 0.2, 0.3, ..., 0.9, 1}.Critical values over the entire grid of 0.02-increment of b are available upon request.For the case of a known break date, the 95% critical values for l = 2 are available for selected values of b and λ in Cho and Vogelsang (2014) [27].The critical values display two main patterns.First, for each given λ the critical values increase as the bandwidth gets bigger.This can be expected given the well known downward bias induced into HAC estimators from estimation error.The fixed-b approximation captures this downward bias and reflects it through larger critical values.Second, for a given value of the bandwidth, the critical values display a V-shaped pattern as a function of λ.As the break point moves closer to zero or one, the critical values increase and the minimum critical values occur at λ = 0.5.

Finite Sample Properties
In this section, we report the results of a finite sample simulation study that illustrates the performance of fixed-b critical values relative to traditional critical values.The data generating process (DGP) is given by (1) with x t = [1, q t ] where q t is a scalar time series, . We use the break point λ = 0.4.The regressor q t and the regression error u t are generated as q t = θq t−1 + t and u t = ρu t−1 + η t + ϕη t−1 , where t and η t are independent of each other with t , η t ∼ i.i.d.N(0, 1).We use the parameter values: θ ∈ {0.5, 0.8, 0.9}, and (ρ, ϕ) ∈ {(0, 0), (0.5, 0.5), (0.9, 0.9)} (see Table 2): The value of θ measures the persistence of the time varying regressor q t .The parameters ρ and ϕ jointly determine the serial correlation structure of the error term u t .Bigger values of these three parameters lead to higher persistence of the series ν 1t ≡ q t u t except for specification A where bigger values of θ would not increase persistence in ν 1t .We set β c 1 = 0, β s 1 = 0 and β c 2 = δ, β s 2 = δ.Under the null hypothesis of no structural change, δ = 0, whereas for δ = 0 there is structural change in both the intercept and slope parameters.We report results for sample sizes T = 100, 200, 500, and 1000 and the number of replications is 2500.The nominal level of all tests is 5%.We compute the Sup/Mean/Exp-W (F) statistics for testing the joint null hypothesis of no structural change in both the intercept and slope parameters.The frequency of rejections for the case of δ = 0 measures the empirical type-I error. 4e report empirical rejection frequencies for traditional inference and for fixed-b inference.In traditional inference, we select the bandwidth following Andrews (1991) [16] for each hypothesized break date using the AR(1) plug-in formula.For fixed-b inference, we report results for different values of b to show how the null rejection probability varies with the choice of b.We also give results for another test in which a single data-dependent bandwidth ratio, denoted by b * , is used across all hypothetical break dates and a fixed-b critical value is applied.The data-dependent bandwidth ratio, b * , is computed as follows.We find the break date which minimizes the sum of squared residuals; we use that break date to select Andrews (1991) [16] data-dependent bandwidth (M * ) with the AR(1) plug-in formula and calculate the implied bandwidth ratio (b * = M * /T); we implement the test using the fixed-b critical values for b * .
The rationale behind b * is as follows.If a different bandwidth is used for each potential break point within the trimming range, then the fixed-b limits of the sup/mean/exp statistics will be functions of those bandwidth ratios and tabulation of fixed-b critical values will be computationally prohibitive.To provide practitioners with a data-dependent bandwidth approach that can be implemented with fixed-b critical values, we need a single data-dependent bandwidth to be used for all potential break points in which case the tabulated critical values can be used.Given the nice properties of the least squares estimator of the break point under the alternative of structural change (see Bai and Perron (1998) [11]), it is natural to use the least squares estimator of the break point to generate residuals needed to implement the Andrews (1991) [16] plug-in formula.Under the null of no structural change, any break point, including the least squares break point, will generate useful residuals for the Andrews (1991) [16] plug-in formula.Crainiceanu and Vogelsang (2007) [28] also considered using the least squares estimator of the break point to deal with the nonmonotonic power of the CUSUM test.
Table 3 provides empirical null rejection frequencies for the traditional tests.For each hypothetical break date, the HAC estimator is constructed using the data-dependent bandwidth.For DGP A with zero persistence, all tests using = 0.05 are subject to severe size distortions when the sample size is 100.Having more data or using more trimming helps reduce the size distortions.The null rejections decrease towards the 5% nominal level for all statistics when T is 500 and = 0.2.Under the DGP B, as the sample size increases from 100 to 500, the null rejection probabilities drop to 0.194 from 0.594 for the supremum test with = 0.2 and the QS kernel being used.The T = 500 rejection rate is still far from the nominal level.Size distortions get worse under more persistent data (DGP C).The mean test, which has the least size distortion of the three statistics, only attains a null rejection of 0.368 with the larger trimming value and T = 500.While traditional inference provides tests with reasonable size under DGPs with zero or mild persistence, as the DGP becomes more persistent, over-rejections can be substantial.Tables 4-6 present simulation results for fixed-b inference.A single bandwidth ratio, b, is applied across all hypothetical break dates in constructing HAC estimators.We report results for b = 0.02, 0.1, 0.5, and 1.These tables also contain the null rejection probability when the traditional critical values in Andrews (1993) [9] or Andrews and Ploberger (1994) [10] are used.The traditional critical values are not designed to work well with relatively large bandwidths and this can be clearly seen in the tables.In general, as the bandwidth ratio gets bigger, the tendency to over-reject becomes more and more pronounced because using more lags generates a systematic downward bias in the HAC estimator and pushes up the value of test statistic.The traditional critical values do not take this impact of lag-choice into account.Because the effective bandwidths play important roles for the behavior of the HAC estimator ( 8), the impact of using large values of b is greater than for HAC estimators in non-structural change settings.
For fixed-b inference, several patterns stand out in Table 4 for the supremum test.Rejections using fixed-b critical values are similar to the rejections in traditional inference when a small bandwidth ratio is used.However, as the bandwidth increases, rejections using fixed-b critical values systematically decrease towards the nominal level of 0.05.Under DGP B, the null rejections decrease as 0.131→0.096→0.083→0.086over the range of b with T = 500 and the Bartlett kernel and = 0.2 being used.Even under DGP C, the null rejections approach the nominal level as b increases for all sample sizes when the QS kernel and the trimming value of 0.2 are used.
Table 7 gives null rejection probabilities when using the data-dependent bandwidth ratio b * .Columns on the left give rejections using fixed-b critical values whereas columns on the right give rejections using traditional critical values.Patterns in Table 7 are similar to patterns in Tables 4-6.Over-rejections are often large when traditional critical values are used.Over-rejections are systematically smaller when fixed-b critical values are used and b * works reasonably well if the sample size is large enough relative to the strength of the persistence in the data.This is particularly true when the QS kernel is used with 0.2 trimming for the mean statistic and 0.05 trimming for the supremum and exponential statistics.Table 4. Empirical Null Rejection Probabilities, SupW (F) test with 5% nominal size, M = bT, H 0 : No Structural Change (δ = 0), T = 100, 200, 500.Note: AP94 are critical values from Andrews and Ploberger (1994) [10] with an adjustment.
We now examine the power of the tests when using b * .We report size-adjusted power for T = 200 in Figures 1-6.Recall the break point under the alternative is λ = 0.4.Odd (even) numbered figures give results with 0.05 (0.2) trimming.Results are given for the three DGPs used for the tables.First note that more trimming leads to higher power in all cases as one would expect.Second, the mean statistic tends to have the highest power regardless of the DGP or kernel.This is not surprising given the power optimality properties of the mean statistic derived by Andrews and Ploberger (1994) [10] using traditional asymptotics.Third, for a given kernel, the supremum and exponential statistics have almost the same power across DGPs and trimming.This is somewhat surprising given that under traditional asymptotics, the exponential statistic is in the class of power optimal tests but the supremum statistic is not.This finding could be driven by values of b * being far away from zero in which case the traditional asymptotics might not be accurately reflecting finite sample power.Finally, the Bartlett kernel tends to give tests with higher power than the QS kernel; a similar finding was made by Kiefer and Vogelsang (2005) [1] in models without structural change.
The size and power results for the statistics implemented with b * point to the typical size-power tradeoff when using HAC variance estimators.Configurations that give the least size distortions also tend to have low power.As long as the data is not too persistent relative to the sample size, a reasonable approach for practice that balances size distortions and power is to use the mean statistic with 0.2 trimming implemented with the QS kernel with b * and fixed-b critical values.

Summary and Conclusions
In this paper, fixed-b asymptotics is applied to the problem of testing for the presence of a structural break in a weakly dependent time series regression.The Wald (F) statistic is the Wald statistic that one obtains when structural change is expressed in terms of dummy variables interacted with regressors as in Bai andPerron (1998, 2003) [11,15].We derived the fixed-b limit of the statistic.In both the full structural change and partial structural change model, the Wald statistic has the same pivotal fixed-b limit.We tabulated fixed-b critical values for Sup/Mean/Exp-Wald (F) statistics which are commonly used for testing parameter instability when the break point is unknown.In a simulation study, we examined the finite sample properties of traditional and fixed-b inference.With persistent data, traditional inference suffers from substantial size distortions.Using fixed-b critical values markedly improves over-rejection problem.A reasonable approach for practice that balances size distortions and power is to use the mean statistic with 0.2 trimming implemented with the QS kernel, b * and fixed-b critical values.Proof of Proposition 1.The limit of the fi follows immediately under Assumptions 1 and 2. Also, plugging the limits of β 1 and β 2 into Equation (4) yields, for r ≤ λ, and for r > λ, . Thus, we can rewrite this result by using indicator functions as , where F p (r, λ) = W p (r) − r λ W p (λ) • 1(r ≤ λ) and F Proof of Lemma 1. Plugging the limit of the partial sum process in Proposition 1 into the HAC estimators in (A4)-(A6), the desired result follows from direct application of the continuous mapping theorem to obtain the desired result in (12).
Proof of Theorem 1. Recall that Using R = (R 1 , − R 1 ) it follows that Using Assumption 1 and Lemma 1, By writing P b, F p (r, λ) in the form (A1)-(A3) using F p (r, λ) = F p (r, λ) , F p (r, λ) , we obtain, after some algebra, the following expression for the above limit: Now apply the transformation: R 1 Q −1 ΛW p (r) The following lemma is used in the proof of Lemma 3.
Proof of Lemma 3. One can easily show The desired result comes from the identity Q X X Q −1 X X = I by substituting Equation (A9) for Q −1 X X .
Proof of Lemma 2. First note that implicit in the proof of Proposition 2 is the result that p lim Q −1 . For R = (R 1 , − R 1 ), it follows that using (A9).The scaled partial sum process is given by For 0 ≤ r < λ, the first term in (A11) satisfies Hence with R = (R 1 , − R 1 ) , from (A10) and (A12), it follows that For the first part of the second term in (A11), it follows that which combined with (A7) and Lemma 3 immediately yields = rR 1 1 (1).
Combining the results for the three terms gives p+q (r, λ) .

2
be b λ not b.Thus, we implicitly use the bandwidth ratio b λ for Ω (F) 11 when we use a full sample bandwidth ratio b for constructing Ω (F) .The second component, P b 1−λ , W * l (r) , is related to Ω (F) 22 (and β 2 ) in exactly the same fashion.Finally, the third component, CP (λ, b), captures the impact of finite sample covariance between β 1 and β 2 on structural change inference.Now consider the unknown break date case and let Wald

∞
(λ) denote the limit of Wald (F) (T b ), where the form of Wald

∞
(λ) depends on whether traditional or fixed-b asymptotic theory is being used.In the case of fixed-b theory, Wald

Table 2 .
Parameter values for simulations