## 1. Introduction

Threshold regression models allow for shifts in economic relationships when the threshold variable crosses the threshold parameter. This paper combines two recent econometric advances in estimating threshold regression models with endogeneity using short panel data sets.

Seo and Shin (

2016) extended GMM estimation techniques for linear dynamic panel data models to threshold panel data models where both the regressors and the threshold variable may be endogenous. Their setup includes certain nonlinear dynamic panel data models such as the self-exciting threshold autoregressive (SETAR) model. We refer to this estimator as the pure GMM estimator. It has the usual properties, including

$\sqrt{N}$-consistency and asymptotic normality, where

N denotes the sample size.

Yu and Phillips (

2018) considered the estimation of threshold regression models with endogenous regressors and threshold variable using i.i.d. data. They developed a (nonparametric) integrated difference kernel (IDK) estimator of the threshold parameter. They showed that the IDK estimator is

N-consistent. Other parameters in the model can be estimated at the usual

$\sqrt{N}$-rate by GMM, taking the estimated threshold parameter as given. The distribution of the IDK estimator is nonstandard.

In this paper, we explain how the ideas of

Yu and Phillips (

2018) can be adapted to the panel data context with fixed effects to obtain an

N-consistent estimator of the threshold parameter. Following

Yu and Phillips, we estimate the threshold parameter using the IDK techniques and then the remaining parameters using standard GMM techniques, taking the estimated threshold parameter as given. The improvement in asymptotic efficiency of the threshold estimator spills over to the GMM estimators of the remaining parameters, since there is effectively one less parameter to estimate. The panel data context is different from the single structural equation with a single threshold variable considered by

Yu and Phillips (

2018). First, to avoid making assumptions about the fixed effects, we begin by eliminating them. This results in

$T-2$ first-differenced structural equations, and each equation involves two threshold variables, where

T denotes the number of time periods. Second, to combine all the information available, we construct two estimators for each equation and then compute their overall average. The final step is to compute GMM estimates for the remaining parameters. Asymptotic theory for the IDK+GMM combination was provided by

Yu and Phillips (

2018) and no additional theoretical results are needed here.

We report results from a simulation study to illustrate advantages of the IDK+GMM combination over pure GMM estimation. The simulations confirm that the IDK+GMM estimator tend to have much smaller root mean square errors (RMSE) than the pure GMM estimator. For example, when N is equal to 800 the RMSE is 320% to 4630% higher for the pure GMM estimator of the threshold parameter. This reflects the fact that the IDK estimator is N-consistent while the pure GMM estimator is only $\sqrt{N}$-consistent.

We also investigated the importance of the choice of instruments. Even for estimating linear dynamic panel data models, the question of which moments to match remains largely unresolved (e.g.,

Ahn and Schmidt 1995;

Arellano 2016).

Seo and Shin (

2016) and

Yu and Phillips (

2018) offered different ad hoc suggestions for threshold models. Our simulations show that large reductions in RMSE are available by adding nonlinear transformations of lagged outcomes to the standard set of instruments. For example, the RSME in the baseline case is 100% to 730% higher than the RSME for an estimator that adds a constant and two percentile indicators of lagged outcomes as instruments.

## 2. The SETAR Panel Data Model

For conciseness, we focus on the self-exciting threshold autoregressive (SETAR) model which is widely used in the time series literature (e.g.,

Tong and Lim 1980;

Teräsvirta et al. 2011). In the panel data terminology, the right-hand side variables in the SETAR model are predetermined rather than endogenous. Our results are easily extended to the case of endogenous regressors and an endogenous threshold variable, as we briefly discuss in the concluding remarks. For

$i=1,\dots ,N$ individuals and

$t=1,\dots ,T$ times, let

${y}_{it}$ be a scalar observed random variable. The observations are assumed to be independent across individuals, but not across time. The basic SETAR panel data model is

where

${c}_{i}$ is a time-invariant individual-specific unobserved random variable, and

${v}_{it}$ is a time- and individual-specific unobserved random variable. The overall constant term is subsumed into

${c}_{i}$ as usual. The lowercase Greek letters denote unknown parameters, and superscripts * indicate “true” values. The threshold parameter is

${\gamma}^{*}$. For simplicity, define

$\xi =(\gamma ,{\alpha}_{1},{\alpha}_{2},{\alpha}_{3})$. The parameter space consists of all

$\xi \in {\mathbb{R}}^{4}$. Assume that all random variables have finite means and variances and that

## 3. GMM Estimator

We begin with the pure GMM estimator. Assumption (

2) implies that for any function

$f:\mathbb{R}\times {\mathbb{R}}^{4}\to \mathbb{R}$ we have

Assumption (

2) therefore implies an abundance of moment restrictions that can be used to estimate the unknown parameters.

Suppose a finite set has been selected and stacked in a

M-vector, say

${p}_{is}\left(\xi \right)$. For example,

${p}_{is}\left(\xi \right)={y}_{is}$,

${p}_{is}\left(\xi \right)={({y}_{is},{y}_{is}1({y}_{is}>\gamma ))}^{\prime}$, or

${p}_{is}\left(\xi \right)={({y}_{is},{y}_{is}^{2},{y}_{is}^{3})}^{\prime}$.

Holtz-Eakin et al. (

1988) and

Arellano and Bond (

1991) proposed a set of linear moment restrictions on the second moments of the data for the linear dynamic panel data model (

${\alpha}_{2}^{*}=0$,

${\alpha}_{3}^{*}=0$, and

${p}_{is}\left(\xi \right)={y}_{is}$). Generalising their set to the present context gives

In addition,

Ahn and Schmidt (

1995) analysed the quadratic restrictions on the second moments of the data

Note $\Delta {u}_{it}$ and ${u}_{iT}$ are defined using the true parameter values and expectations are taken using the true parameter values.

Define

${y}_{i}={({y}_{i1},\dots ,{y}_{iT})}^{\prime}$ and let

$g({y}_{i},\xi )$ be a vector of random variables such that the stacked moment restrictions can be written as

$\mathsf{E}\left[g({y}_{i},{\xi}^{*})\right]=0$. A necessary condition for the chosen moment restrictions to identify

${\xi}^{*}$ is that

$\mathsf{E}\left[g({y}_{i},\xi )\right]=0$ if and only if

$\xi ={\xi}^{*}$. A GMM estimator of

${\xi}^{*}$ is defined as the global minimiser,

$\widehat{\xi}$, of the GMM objective function,

where

$\widehat{W}$ is a given weight matrix. The objective function attains its minimum on an interval of

$\gamma $ values. The ambiguity can be resolved by defining

$\widehat{\gamma}$ as the midpoint (e.g.,

Yu 2015). Note that in general, the weight matrix

$\widehat{W}$ may also be a function of the unknown parameters

$\xi $ (e.g.,

Hansen et al. 1996).

Despite nondifferentiability of the objective function with respect to

$\gamma $, the asymptotic distribution of the GMM estimator is typically normal. Define the matrices

$G={\mathsf{D}}_{\xi}\mathsf{E}\left[g({x}_{i},{\xi}^{*})\right]$ and

$\mathsf{\Omega}=\mathrm{E}\left(g({y}_{i},{\xi}^{*})g{({y}_{i},{\xi}^{*})}^{\prime}\right)$, where

${\mathsf{D}}_{\xi}$ denotes the partial derivative.

Seo and Shin (

2016) proved that if

$\widehat{W}{\to}^{p}{\mathsf{\Omega}}^{-1}$,

${G}^{\prime}{\mathsf{\Omega}}^{-1}G$ is nonsingular, and other technical regularity conditions are satisfied, then

In particular, the GMM estimator is $\sqrt{N}$-consistent.

## 4. IDK Estimator

In this section we explain how the ideas of

Yu and Phillips (

2018) can be adapted to the panel data context with fixed effects to obtain an

N-consistent estimator of the threshold parameter. We begin with eliminating the fixed effects by first-differencing the structural equation. Then we construct two estimators of the threshold parameter for each of the resulting

$T-2$ equations. Finally, we obtain an overall estimator by taking the simple average of the basic estimators.

After first-differencing the structural Equation (

1) and taking the conditional expectation, we get

Because the indicator functions are discontinuous, the conditional expectation is discontinuous when

${y}_{it-1}$ or

${y}_{it-2}$ equals

${\gamma}^{*}$. If the conditional expectation is smooth everywhere else, then these discontinuities identify

${\gamma}^{*}$. The idea of the IDK estimator is to exploit the discontinuities for estimating

${\gamma}^{*}$. To rule out discontinuities occurring elsewhere, in addition to (

2) assume that

To show that the discontinuities identify

${\gamma}^{*}$, let

${\gamma}^{-}$ and

${\gamma}^{+}$ indicate limits from the left and from the right, and define the functions

${A}_{t}$ and

${B}_{t}$ as the difference between the left and right limits of the conditional expectation function when

${y}_{it-1}$ and

${y}_{it-2}$ is near

${\gamma}^{*}$; that is,

and

Using assumption (

10), we then have

It follows that

${\gamma}^{*}={arg\; max}_{\gamma}{A}_{t}{(y,\gamma )}^{2}$ and

${\gamma}^{*}={arg\; max}_{\gamma}{B}_{t}{(y,\gamma )}^{2}$ for all

$y\in \mathbb{R}$. Furthermore,

${\gamma}^{*}{\alpha}_{2}^{*}+{\alpha}_{3}^{*}\ne 0$ is a necessary condition for (

13) to uniquely identify

${\gamma}^{*}$.

While it is possible to base estimation of

${\gamma}^{*}$ on

${A}_{t}(y,\xb7)$ or

${B}_{t}(\xb7,y)$ with a fixed value of

y, such an estimator will not have good properties. To achieve

N-consistency, our estimators of

${\gamma}^{*}$ are based on density-weighted averages of

${A}_{t}$ and

${B}_{t}$. Let

${r}_{t}$ denote the joint density of

$({y}_{it-2},{y}_{it-1})$ and let

${p}_{t}$ denote the marginal density of

${y}_{it}$. Define the objective function

${R}_{t}^{A}$ by

and the objective function

${R}_{t}^{B}$ by

The discontinuity points of ${R}_{t}^{A}$ and ${R}_{t}^{B}$ are the same as those of ${A}_{t}(y,\xb7)$ and ${B}_{t}(\xb7,y)$ provided certain technical regularity conditions hold, including that ${r}_{t}$ is continuous and bounded away from 0 in an open neighbourhood where ${y}_{it-2}={\gamma}^{*}$ or ${y}_{it-1}={\gamma}^{*}$. That is, we generally have that ${\gamma}^{*}={arg\; max}_{\gamma}{R}_{t}^{A}\left(\gamma \right)$ and ${\gamma}^{*}={arg\; max}_{\gamma}{R}_{t}^{B}\left(\gamma \right)$.

We define “basic” IDK estimators as the arg max of each of the sample analogues of

${R}_{t}^{A}$ and

${R}_{t}^{B}$ for

$t=3,\dots ,T$. The estimators of

${R}_{t}^{A}$ and

${R}_{t}^{B}$ are implemented using generalised kernels. Let

k be a univariate kernel function with support

$[-1,1]$, and let

h denote the bandwidth. To keep the notation simple, we use the same bandwidth everywhere. Then estimator of

${R}_{t}^{A}$ and

${R}_{t}^{B}$ are

and

where

Define the estimators ${\widehat{\gamma}}_{t}^{A}={arg\; max}_{\gamma}{\widehat{R}}_{t}^{A}\left(\gamma \right)$ and ${\widehat{\gamma}}_{t}^{B}={arg\; max}_{\gamma}{\widehat{R}}_{t}^{B}\left(\gamma \right)$ for $t=3,\dots ,T$. Finally, we construct an overall estimator $\widehat{\gamma}$ by taking the average of all ${\widehat{\gamma}}_{t}^{A}$ and ${\widehat{\gamma}}_{t}^{B}$.

Having estimated

${\gamma}^{*}$, the

${\alpha}^{*}$s can be estimated in a second step at the

$\sqrt{N}$-rate by GMM as described in

Section 3 after redefining

$\xi =({\alpha}_{1},{\alpha}_{2},{\alpha}_{3})$. Since

$\widehat{\gamma}$ converges at the

N-rate, the asymptotic distribution is the same as if

${\gamma}^{*}$ is known.

The setup here differs somewhat from that of

Yu and Phillips (

2018), who considered a single structural equation with a single threshold variable. Here we have

$T-2$ first-differenced structural equations, and each equation involves two threshold variables. The latter means that it is necessary to condition on both

${y}_{it-2}$ and

${y}_{it-1}$ in (

9), and gives rise to the two distinct estimators based on

${A}_{t}$ and

${B}_{t}$, respectively.

Yu and Phillips (

2018) proved that the basic IDK estimator is

N-consistent under certain technical regularity conditions. The asymptotic distribution is nonstandard. Their results apply directly to each of our basic estimators,

${\widehat{\gamma}}_{t}^{A}$ and

${\widehat{\gamma}}_{t}^{B}$ for

$t=3,\dots ,T$. Taking the overall average does not affect the

N-consistency and reduces the variance.

Yu and Phillips (

2018) did not provide standard errors in their empirical illustration. Arguably, we are interested in making inferences about the regression function in most empirical applications, not about individual parameters, and the former is dominated by the variance of

$\widehat{\alpha}$s, while the variance of

$\widehat{\gamma}$ is negligible in comparison. Inference methods for the threshold parameter are developed by

Liao et al. (

2018).

## 5. Simulation Results

To illustrate the advantage of the IDK+GMM estimator over pure GMM and to investigate the importance of the choice of instruments, we conducted a small simulation study for one of the designs used by

Seo and Shin (

2016). The DGP is defined in the table note. For simplicity, all results for the GMM estimators presented here are one-step estimators using the optimal weight matrix.

Panel A of

Table 1 shows our baseline results which use only the untransformed lagged outcome variables as instruments, as suggested by

Seo and Shin (

2016). The RMSE for the pure GMM estimator are monotonically decreasing at rates suggesting

$\sqrt{N}$-consistency, as expected. The RMSE for the IDK+GMM estimator are much lower, especially for

$\gamma $, and the convergence rates are compatible with

N-consistency for

$\gamma $ and

$\sqrt{N}$-consistency for the

$\alpha $s.

Given the disparate convergence rates we expect the RMSE ratio of pure GMM to the IDK+GMM combination for

$\gamma $ to diverge, while the RMSE ratios for the

$\alpha $s should converge to finite limit values corresponding to the ratio of the asymptotic variances of the respective GMM estimators. The numbers shown in the right-most four columns in

Table 1 are compatible with these expectations. In panel A, when

$N=800$, the efficiency gain for

$\gamma $ is huge, more than a factor of 27. The gains for the

$\alpha $s are also large, with RMSE for pure GMM more than twice the RMSE for the IDK+GMM estimator.

In the remainder of

Table 1 we consider different sets of instruments. Panel B shows big reductions in RMSE for the pure GMM estimator when a constant term is also used as an instrument.

Han and Kim (

2014) and

Gørgens et al. (

2016) found similar improvements for the linear model. The improvements are relatively less for the IDK+GMM estimator.

Since the structural equation is nonlinear, one might expect that nonlinear transformations of lagged outcomes could be useful instruments. Based on the suggestion by

Yu and Phillips (

2018), we added

${y}_{is}1({y}_{is}>\widehat{\gamma})$ to the set of instruments. Panel C in

Table 1 shows that this does not improve the RMSE for the pure GMM estimator. On the contrary, the estimation noise in the instruments adds significantly to the RMSE. The results are more promising for the IDK+GMM estimator, where substantial reductions in RMSE are observed.

In panel D, we have added quadratic and cubic transformations of the lagged dependent variable, and in panel E we have added threshold functions where the threshold depends on percentiles of the data rather than the structural parameter. As shown in panel F, when $N=800$ the RMSE for the pure GMM estimator drops by factors of 3.6–6.6, while the RMSE for the IDK+GMM estimator drops by factors of 2.7–3.4.

## 6. Concluding Remarks

This paper has shown how the ideas of

Yu and Phillips (

2018) can be adapted to the panel data context with fixed effects. Theoretically, the advantage of the IDK+GMM combination is that the estimator of the threshold parameter is

N-consistent, while the pure GMM estimator converges only at the

$\sqrt{N}$-rate. In simulation exercises, we confirmed that the IDK+GMM combination offers a huge practical advantage over pure GMM estimation, even when the former is implemented relatively simply. We also investigated the importance of the choice of instruments and showed that adding fixed nonlinear transformations of the lagged dependent variable can be highly effective when estimating nonlinear equations.

We have focused on the SETAR model in this paper. A more general threshold regression panel data model is

where

${x}_{it}$ is a vector of possibly endogenous variables,

${q}_{it}$ is a possibly endogenous scalar variable, and

${\alpha}_{1}^{*}$,

${\alpha}_{2}^{*}$ and

${\alpha}_{3}^{*}$ are conformable parameter vectors. It is straightforward to construct an IDK+GMM estimator analog to the SETAR case, and similar efficiency gains are available.

The IDK estimator we have described utilises discontinuities in the conditional expectation function given in (

9). It will fail if

${\gamma}^{*}{\alpha}_{2}^{*}+{\alpha}_{3}^{*}=0$, because then (

9) is continuous. However, in this case the partial derivatives of (

9) may be discontinuous at

${y}_{it-2}={\gamma}^{*}$ or

${y}_{it-1}={\gamma}^{*}$, so IDK estimation is still possible (e.g.,

Yu and Phillips 2018;

Porter and Yu 2015).

If it is known that

$\mathsf{E}\left({c}_{i}\right|{y}_{it-1}=y)$ is a smooth function of

y, then we can construct an estimator of

${\gamma}^{*}$ directly based on Equation (

1), without first-differencing and without assumption (

10). Since an extra time period is available for estimation and since we only need to smooth in one dimension (

${y}_{it-1}$) instead of two (

${y}_{it-1},{y}_{it-2}$) when defining

${R}_{t}$, this estimator is expected to be more efficient.

For simplicity, we have constructed an overall estimator by taking a simple average of multiple estimators based on separate equations. It is a topic for future research to investigate how best to combine the information. One could consider weighted averages or, instead of averaging separate estimators, one could base an estimator on a (weighted) average over the objective functions. Which is better may depend on e.g., the time pattern of $\mathrm{Var}\left({v}_{it}\right)$.

Finally, to illustrate the advantage of the IDK+GMM estimator over the pure GMM estimator, our simulations focused on the design considered by

Seo and Shin (

2016). To further investigate the properties of the IDK+GMM estimator in future research, it would be interesting to consider simulation designs where endogeneity is more severe (e.g.,

${c}_{i}$ is correlated with

${y}_{i1}$) and where the number of time periods is smaller (i.e.,

T is small). Also, in practice the optimal weight matrix is not known, and it would be useful to compare two-step estimation of the weight matrix and continuous updating (e.g.,

Hansen et al. 1996).