Threshold Regression with Endogeneity for Short Panels

This paper considers the estimation of dynamic threshold regression models with fixed effects using short panel data. We examine a two-step method, where the threshold parameter is estimated nonparametrically at the N-rate and the remaining parameters are estimated by GMM at the √ N-rate. We provide simulation results that illustrate advantages of the new method in comparison with pure GMM estimation. The simulations also highlight the importance of the choice of instruments in GMM estimation.


Introduction
Threshold regression models allow for shifts in economic relationships when the threshold variable crosses the threshold parameter.This paper combines two recent econometric advances in estimating threshold regression models with endogeneity using short panel data sets.Seo and Shin (2016) extended GMM estimation techniques for linear dynamic panel data models to threshold panel data models where both the regressors and the threshold variable may be endogenous.Their setup includes certain nonlinear dynamic panel data models such as the self-exciting threshold autoregressive (SETAR) model.We refer to this estimator as the pure GMM estimator.It has the usual properties, including √ N-consistency and asymptotic normality, where N denotes the sample size.Yu and Phillips (2018) considered the estimation of threshold regression models with endogenous regressors and threshold variable using i.i.d.data.They developed a (nonparametric) integrated difference kernel (IDK) estimator of the threshold parameter.They showed that the IDK estimator is N-consistent.Other parameters in the model can be estimated at the usual √ N-rate by GMM, taking the estimated threshold parameter as given.The distribution of the IDK estimator is nonstandard.
In this paper, we explain how the ideas of Yu and Phillips (2018) can be adapted to the panel data context with fixed effects to obtain an N-consistent estimator of the threshold parameter.Following Yu and Phillips, we estimate the threshold parameter using the IDK techniques and then the remaining parameters using standard GMM techniques, taking the estimated threshold parameter as given.The improvement in asymptotic efficiency of the threshold estimator spills over to the GMM estimators of the remaining parameters, since there is effectively one less parameter to estimate.The panel data context is different from the single structural equation with a single threshold variable considered by Yu and Phillips (2018).First, to avoid making assumptions about the fixed effects, we begin by eliminating them.This results in T − 2 first-differenced structural equations, and each equation involves two threshold variables, where T denotes the number of time periods.Second, to combine all the information available, we construct two estimators for each equation and then compute their overall average.The final step is to compute GMM estimates for the remaining parameters.Asymptotic theory for the IDK+GMM combination was provided by Yu and Phillips (2018) and no additional theoretical results are needed here.
We report results from a simulation study to illustrate advantages of the IDK+GMM combination over pure GMM estimation.The simulations confirm that the IDK+GMM estimator tend to have much smaller root mean square errors (RMSE) than the pure GMM estimator.For example, when N is equal to 800 the RMSE is 320% to 4630% higher for the pure GMM estimator of the threshold parameter.This reflects the fact that the IDK estimator is N-consistent while the pure GMM estimator is only √ N-consistent.
We also investigated the importance of the choice of instruments.Even for estimating linear dynamic panel data models, the question of which moments to match remains largely unresolved (e.g., Ahn and Schmidt 1995;Arellano 2016).Seo and Shin (2016) and Yu and Phillips (2018) offered different ad hoc suggestions for threshold models.Our simulations show that large reductions in RMSE are available by adding nonlinear transformations of lagged outcomes to the standard set of instruments.For example, the RSME in the baseline case is 100% to 730% higher than the RSME for an estimator that adds a constant and two percentile indicators of lagged outcomes as instruments.

The SETAR Panel Data Model
For conciseness, we focus on the self-exciting threshold autoregressive (SETAR) model which is widely used in the time series literature (e.g., Tong and Lim 1980;Teräsvirta et al. 2011).In the panel data terminology, the right-hand side variables in the SETAR model are predetermined rather than endogenous.Our results are easily extended to the case of endogenous regressors and an endogenous threshold variable, as we briefly discuss in the concluding remarks.For i = 1, . . ., N individuals and t = 1, . . ., T times, let y it be a scalar observed random variable.The observations are assumed to be independent across individuals, but not across time.The basic SETAR panel data model is where c i is a time-invariant individual-specific unobserved random variable, and v it is a time-and individual-specific unobserved random variable.The overall constant term is subsumed into c i as usual.The lowercase Greek letters denote unknown parameters, and superscripts * indicate "true" values.The threshold parameter is γ * .For simplicity, define ξ = (γ, α 1 , α 2 , α 3 ).The parameter space consists of all ξ ∈ R 4 .Assume that all random variables have finite means and variances and that An additional smoothness assumption will be introduced in Section 4. Some authors assume α * 3 = 0 from the outset (e.g., Hansen 1999;González et al. 2017).In the time series and cross-section literatures α * 3 is estimated (e.g., Tong and Lim 1980;Tong 2011;Seo and Shin 2016;Yu and Phillips 2018).

GMM Estimator
We begin with the pure GMM estimator.Assumption (2) implies that for any function f : Assumption (2) therefore implies an abundance of moment restrictions that can be used to estimate the unknown parameters.Suppose a finite set has been selected and stacked in a M-vector, say p is (ξ).For example, Eakin et al. (1988) and Arellano and Bond (1991) proposed a set of linear moment restrictions on the second moments of the data for the linear dynamic panel data model (α * 2 = 0, α * 3 = 0, and p is (ξ) = y is ).Generalising their set to the present context gives (4) Crepon et al. (1997), Andrews and Lu (2001), Han and Kim (2014) and Gørgens et al. (2016) pointed out that there are also useful restrictions on the first moments of the data; namely (5) In addition, Ahn and Schmidt (1995) analysed the quadratic restrictions on the second moments of the data Note ∆u it and u iT are defined using the true parameter values and expectations are taken using the true parameter values.
Define y i = (y i1 , . . ., y iT ) and let g(y i , ξ) be a vector of random variables such that the stacked moment restrictions can be written as E[g(y i , ξ * )] = 0.A necessary condition for the chosen moment restrictions to identify ξ * is that E[g(y i , ξ)] = 0 if and only if ξ = ξ * .A GMM estimator of ξ * is defined as the global minimiser, ξ, of the GMM objective function, where Ŵ is a given weight matrix.The objective function attains its minimum on an interval of γ values.
The ambiguity can be resolved by defining γ as the midpoint (e.g., Yu 2015).Note that in general, the weight matrix Ŵ may also be a function of the unknown parameters ξ (e.g., Hansen et al. 1996).Despite nondifferentiability of the objective function with respect to γ, the asymptotic distribution of the GMM estimator is typically normal.Define the matrices G = D ξ E[g(x i , ξ * )] and Ω = E(g(y i , ξ * )g(y i , ξ * ) ), where D ξ denotes the partial derivative.Seo and Shin (2016) is nonsingular, and other technical regularity conditions are satisfied, then In particular, the GMM estimator is √ N-consistent.

IDK Estimator
In this section we explain how the ideas of Yu and Phillips (2018) can be adapted to the panel data context with fixed effects to obtain an N-consistent estimator of the threshold parameter.We begin with eliminating the fixed effects by first-differencing the structural equation.Then we construct two estimators of the threshold parameter for each of the resulting T − 2 equations.Finally, we obtain an overall estimator by taking the simple average of the basic estimators.
After first-differencing the structural Equation (1) and taking the conditional expectation, we get Because the indicator functions are discontinuous, the conditional expectation is discontinuous when y it−1 or y it−2 equals γ * .If the conditional expectation is smooth everywhere else, then these discontinuities identify γ * .The idea of the IDK estimator is to exploit the discontinuities for estimating γ * .To rule out discontinuities occurring elsewhere, in addition to (2) assume that To show that the discontinuities identify γ * , let γ − and γ + indicate limits from the left and from the right, and define the functions A t and B t as the difference between the left and right limits of the conditional expectation function when y it−1 and y it−2 is near γ * ; that is, and Using assumption (10), we then have It follows that γ * = arg max γ A t (y, γ) 2 and γ * = arg max γ B t (y, γ) 2 for all y ∈ R. Furthermore, γ * α * 2 + α * 3 = 0 is a necessary condition for (13) to uniquely identify γ * .While it is possible to base estimation of γ * on A t (y, •) or B t (•, y) with a fixed value of y, such an estimator will not have good properties.To achieve N-consistency, our estimators of γ * are based on density-weighted averages of A t and B t .Let r t denote the joint density of (y it−2 , y it−1 ) and let p t denote the marginal density of y it .Define the objective function R A t by and the objective function R B t by The discontinuity points of R A t and R B t are the same as those of A t (y, •) and B t (•, y) provided certain technical regularity conditions hold, including that r t is continuous and bounded away from 0 in an open neighbourhood where y it−2 = γ * or y it−1 = γ * .That is, we generally have that γ * = arg max γ R A t (γ) and γ * = arg max γ R B t (γ).We define "basic" IDK estimators as the arg max of each of the sample analogues of R A t and R B t for t = 3, . . ., T. The estimators of R A t and R B t are implemented using generalised kernels.Let k be a univariate kernel function with support [−1, 1], and let h denote the bandwidth.To keep the notation simple, we use the same bandwidth everywhere.Then estimator of R A t and R B t are and where Define the estimators γA t = arg max γ RA t (γ) and γB t = arg max γ RB t (γ) for t = 3, . . ., T. Finally, we construct an overall estimator γ by taking the average of all γA t and γB t .Having estimated γ * , the α * s can be estimated in a second step at the √ N-rate by GMM as described in Section 3 after redefining ξ = (α 1 , α 2 , α 3 ).Since γ converges at the N-rate, the asymptotic distribution is the same as if γ * is known.
The setup here differs somewhat from that of Yu and Phillips (2018), who considered a single structural equation with a single threshold variable.Here we have T − 2 first-differenced structural equations, and each equation involves two threshold variables.The latter means that it is necessary to condition on both y it−2 and y it−1 in (9), and gives rise to the two distinct estimators based on A t and B t , respectively.Yu and Phillips (2018) proved that the basic IDK estimator is N-consistent under certain technical regularity conditions.The asymptotic distribution is nonstandard.Their results apply directly to each of our basic estimators, γA t and γB t for t = 3, . . ., T. Taking the overall average does not affect the N-consistency and reduces the variance.Yu and Phillips (2018) did not provide standard errors in their empirical illustration.Arguably, we are interested in making inferences about the regression function in most empirical applications, not about individual parameters, and the former is dominated by the variance of αs, while the variance of γ is negligible in comparison.Inference methods for the threshold parameter are developed by Liao et al. (2018).

Simulation Results
To illustrate the advantage of the IDK+GMM estimator over pure GMM and to investigate the importance of the choice of instruments, we conducted a small simulation study for one of the designs used by Seo and Shin (2016).The DGP is defined in the table note.For simplicity, all results for the GMM estimators presented here are one-step estimators using the optimal weight matrix.
In the remainder of Table 1 we consider different sets of instruments.Panel B shows big reductions in RMSE for the pure GMM estimator when a constant term is also used as an instrument.Han and Kim (2014) and Gørgens et al. (2016) found similar improvements for the linear model.The improvements are relatively less for the IDK+GMM estimator.
Since the structural equation is nonlinear, one might expect that nonlinear transformations of lagged outcomes could be useful instruments.Based on the suggestion by Yu and Phillips (2018), we added y is 1(y is > γ) to the set of instruments.Panel C in Table 1 shows that this does not improve the RMSE for the pure GMM estimator.On the contrary, the estimation noise in the instruments adds significantly to the RMSE.The results are more promising for the IDK+GMM estimator, where substantial reductions in RMSE are observed.
In panel D, we have added quadratic and cubic transformations of the lagged dependent variable, and in panel E we have added threshold functions where the threshold depends on percentiles of the data rather than the structural parameter.As shown in panel F, when N = 800 the RMSE for the pure GMM estimator drops by factors of 3.6-6.6,while the RMSE for the IDK+GMM estimator drops by factors of 2.7-3.4.

Concluding Remarks
This paper has shown how the ideas of Yu and Phillips (2018) can be adapted to the panel data context with fixed effects.Theoretically, the advantage of the IDK+GMM combination is that the estimator of the threshold parameter is N-consistent, while the pure GMM estimator converges only at the √ N-rate.In simulation exercises, we confirmed that the IDK+GMM combination offers a huge practical advantage over pure GMM estimation, even when the former is implemented relatively simply.We also investigated the importance of the choice of instruments and showed that adding fixed nonlinear transformations of the lagged dependent variable can be highly effective when estimating nonlinear equations.
We have focused on the SETAR model in this paper.A more general threshold regression panel data model is where x it is a vector of possibly endogenous variables, q it is a possibly endogenous scalar variable, and α * 1 , α * 2 and α * 3 are conformable parameter vectors.It is straightforward to construct an IDK+GMM estimator analog to the SETAR case, and similar efficiency gains are available.
If it is known that E(c i |y it−1 = y) is a smooth function of y, then we can construct an estimator of γ * directly based on Equation (1), without first-differencing and without assumption (10).Since an extra time period is available for estimation and since we only need to smooth in one dimension (y it−1 ) instead of two (y it−1 , y it−2 ) when defining R t , this estimator is expected to be more efficient.
For simplicity, we have constructed an overall estimator by taking a simple average of multiple estimators based on separate equations.It is a topic for future research to investigate how best to combine the information.One could consider weighted averages or, instead of averaging separate estimators, one could base an estimator on a (weighted) average over the objective functions.Which is better may depend on e.g., the time pattern of Var(v it ).
Finally, to illustrate the advantage of the IDK+GMM estimator over the pure GMM estimator, our simulations focused on the design considered by Seo and Shin (2016).To further investigate the properties of the IDK+GMM estimator in future research, it would be interesting to consider simulation designs where endogeneity is more severe (e.g., c i is correlated with y i1 ) and where the number of time periods is smaller (i.e., T is small).Also, in practice the optimal weight matrix is not known, and it would be useful to compare two-step estimation of the weight matrix and continuous updating (e.g., Hansen et al. 1996).