1. Introduction
Threshold regression models allow for shifts in economic relationships when the threshold variable crosses the threshold parameter. This paper combines two recent econometric advances in estimating threshold regression models with endogeneity using short panel data sets.
Seo and Shin (
2016) extended GMM estimation techniques for linear dynamic panel data models to threshold panel data models where both the regressors and the threshold variable may be endogenous. Their setup includes certain nonlinear dynamic panel data models such as the self-exciting threshold autoregressive (SETAR) model. We refer to this estimator as the pure GMM estimator. It has the usual properties, including
-consistency and asymptotic normality, where
N denotes the sample size.
Yu and Phillips (
2018) considered the estimation of threshold regression models with endogenous regressors and threshold variable using i.i.d. data. They developed a (nonparametric) integrated difference kernel (IDK) estimator of the threshold parameter. They showed that the IDK estimator is
N-consistent. Other parameters in the model can be estimated at the usual
-rate by GMM, taking the estimated threshold parameter as given. The distribution of the IDK estimator is nonstandard.
In this paper, we explain how the ideas of
Yu and Phillips (
2018) can be adapted to the panel data context with fixed effects to obtain an
N-consistent estimator of the threshold parameter. Following
Yu and Phillips, we estimate the threshold parameter using the IDK techniques and then the remaining parameters using standard GMM techniques, taking the estimated threshold parameter as given. The improvement in asymptotic efficiency of the threshold estimator spills over to the GMM estimators of the remaining parameters, since there is effectively one less parameter to estimate. The panel data context is different from the single structural equation with a single threshold variable considered by
Yu and Phillips (
2018). First, to avoid making assumptions about the fixed effects, we begin by eliminating them. This results in
first-differenced structural equations, and each equation involves two threshold variables, where
T denotes the number of time periods. Second, to combine all the information available, we construct two estimators for each equation and then compute their overall average. The final step is to compute GMM estimates for the remaining parameters. Asymptotic theory for the IDK+GMM combination was provided by
Yu and Phillips (
2018) and no additional theoretical results are needed here.
We report results from a simulation study to illustrate advantages of the IDK+GMM combination over pure GMM estimation. The simulations confirm that the IDK+GMM estimator tend to have much smaller root mean square errors (RMSE) than the pure GMM estimator. For example, when N is equal to 800 the RMSE is 320% to 4630% higher for the pure GMM estimator of the threshold parameter. This reflects the fact that the IDK estimator is N-consistent while the pure GMM estimator is only -consistent.
We also investigated the importance of the choice of instruments. Even for estimating linear dynamic panel data models, the question of which moments to match remains largely unresolved (e.g.,
Ahn and Schmidt 1995;
Arellano 2016).
Seo and Shin (
2016) and
Yu and Phillips (
2018) offered different ad hoc suggestions for threshold models. Our simulations show that large reductions in RMSE are available by adding nonlinear transformations of lagged outcomes to the standard set of instruments. For example, the RSME in the baseline case is 100% to 730% higher than the RSME for an estimator that adds a constant and two percentile indicators of lagged outcomes as instruments.
2. The SETAR Panel Data Model
For conciseness, we focus on the self-exciting threshold autoregressive (SETAR) model which is widely used in the time series literature (e.g.,
Tong and Lim 1980;
Teräsvirta et al. 2011). In the panel data terminology, the right-hand side variables in the SETAR model are predetermined rather than endogenous. Our results are easily extended to the case of endogenous regressors and an endogenous threshold variable, as we briefly discuss in the concluding remarks. For
individuals and
times, let
be a scalar observed random variable. The observations are assumed to be independent across individuals, but not across time. The basic SETAR panel data model is
where
is a time-invariant individual-specific unobserved random variable, and
is a time- and individual-specific unobserved random variable. The overall constant term is subsumed into
as usual. The lowercase Greek letters denote unknown parameters, and superscripts * indicate “true” values. The threshold parameter is
. For simplicity, define
. The parameter space consists of all
. Assume that all random variables have finite means and variances and that
3. GMM Estimator
We begin with the pure GMM estimator. Assumption (
2) implies that for any function
we have
Assumption (
2) therefore implies an abundance of moment restrictions that can be used to estimate the unknown parameters.
Suppose a finite set has been selected and stacked in a
M-vector, say
. For example,
,
, or
.
Holtz-Eakin et al. (
1988) and
Arellano and Bond (
1991) proposed a set of linear moment restrictions on the second moments of the data for the linear dynamic panel data model (
,
, and
). Generalising their set to the present context gives
In addition,
Ahn and Schmidt (
1995) analysed the quadratic restrictions on the second moments of the data
Note and are defined using the true parameter values and expectations are taken using the true parameter values.
Define
and let
be a vector of random variables such that the stacked moment restrictions can be written as
. A necessary condition for the chosen moment restrictions to identify
is that
if and only if
. A GMM estimator of
is defined as the global minimiser,
, of the GMM objective function,
where
is a given weight matrix. The objective function attains its minimum on an interval of
values. The ambiguity can be resolved by defining
as the midpoint (e.g.,
Yu 2015). Note that in general, the weight matrix
may also be a function of the unknown parameters
(e.g.,
Hansen et al. 1996).
Despite nondifferentiability of the objective function with respect to
, the asymptotic distribution of the GMM estimator is typically normal. Define the matrices
and
, where
denotes the partial derivative.
Seo and Shin (
2016) proved that if
,
is nonsingular, and other technical regularity conditions are satisfied, then
In particular, the GMM estimator is -consistent.
4. IDK Estimator
In this section we explain how the ideas of
Yu and Phillips (
2018) can be adapted to the panel data context with fixed effects to obtain an
N-consistent estimator of the threshold parameter. We begin with eliminating the fixed effects by first-differencing the structural equation. Then we construct two estimators of the threshold parameter for each of the resulting
equations. Finally, we obtain an overall estimator by taking the simple average of the basic estimators.
After first-differencing the structural Equation (
1) and taking the conditional expectation, we get
Because the indicator functions are discontinuous, the conditional expectation is discontinuous when
or
equals
. If the conditional expectation is smooth everywhere else, then these discontinuities identify
. The idea of the IDK estimator is to exploit the discontinuities for estimating
. To rule out discontinuities occurring elsewhere, in addition to (
2) assume that
To show that the discontinuities identify
, let
and
indicate limits from the left and from the right, and define the functions
and
as the difference between the left and right limits of the conditional expectation function when
and
is near
; that is,
and
Using assumption (
10), we then have
It follows that
and
for all
. Furthermore,
is a necessary condition for (
13) to uniquely identify
.
While it is possible to base estimation of
on
or
with a fixed value of
y, such an estimator will not have good properties. To achieve
N-consistency, our estimators of
are based on density-weighted averages of
and
. Let
denote the joint density of
and let
denote the marginal density of
. Define the objective function
by
and the objective function
by
The discontinuity points of and are the same as those of and provided certain technical regularity conditions hold, including that is continuous and bounded away from 0 in an open neighbourhood where or . That is, we generally have that and .
We define “basic” IDK estimators as the arg max of each of the sample analogues of
and
for
. The estimators of
and
are implemented using generalised kernels. Let
k be a univariate kernel function with support
, and let
h denote the bandwidth. To keep the notation simple, we use the same bandwidth everywhere. Then estimator of
and
are
and
where
Define the estimators and for . Finally, we construct an overall estimator by taking the average of all and .
Having estimated
, the
s can be estimated in a second step at the
-rate by GMM as described in
Section 3 after redefining
. Since
converges at the
N-rate, the asymptotic distribution is the same as if
is known.
The setup here differs somewhat from that of
Yu and Phillips (
2018), who considered a single structural equation with a single threshold variable. Here we have
first-differenced structural equations, and each equation involves two threshold variables. The latter means that it is necessary to condition on both
and
in (
9), and gives rise to the two distinct estimators based on
and
, respectively.
Yu and Phillips (
2018) proved that the basic IDK estimator is
N-consistent under certain technical regularity conditions. The asymptotic distribution is nonstandard. Their results apply directly to each of our basic estimators,
and
for
. Taking the overall average does not affect the
N-consistency and reduces the variance.
Yu and Phillips (
2018) did not provide standard errors in their empirical illustration. Arguably, we are interested in making inferences about the regression function in most empirical applications, not about individual parameters, and the former is dominated by the variance of
s, while the variance of
is negligible in comparison. Inference methods for the threshold parameter are developed by
Liao et al. (
2018).
5. Simulation Results
To illustrate the advantage of the IDK+GMM estimator over pure GMM and to investigate the importance of the choice of instruments, we conducted a small simulation study for one of the designs used by
Seo and Shin (
2016). The DGP is defined in the table note. For simplicity, all results for the GMM estimators presented here are one-step estimators using the optimal weight matrix.
Panel A of
Table 1 shows our baseline results which use only the untransformed lagged outcome variables as instruments, as suggested by
Seo and Shin (
2016). The RMSE for the pure GMM estimator are monotonically decreasing at rates suggesting
-consistency, as expected. The RMSE for the IDK+GMM estimator are much lower, especially for
, and the convergence rates are compatible with
N-consistency for
and
-consistency for the
s.
Given the disparate convergence rates we expect the RMSE ratio of pure GMM to the IDK+GMM combination for
to diverge, while the RMSE ratios for the
s should converge to finite limit values corresponding to the ratio of the asymptotic variances of the respective GMM estimators. The numbers shown in the right-most four columns in
Table 1 are compatible with these expectations. In panel A, when
, the efficiency gain for
is huge, more than a factor of 27. The gains for the
s are also large, with RMSE for pure GMM more than twice the RMSE for the IDK+GMM estimator.
In the remainder of
Table 1 we consider different sets of instruments. Panel B shows big reductions in RMSE for the pure GMM estimator when a constant term is also used as an instrument.
Han and Kim (
2014) and
Gørgens et al. (
2016) found similar improvements for the linear model. The improvements are relatively less for the IDK+GMM estimator.
Since the structural equation is nonlinear, one might expect that nonlinear transformations of lagged outcomes could be useful instruments. Based on the suggestion by
Yu and Phillips (
2018), we added
to the set of instruments. Panel C in
Table 1 shows that this does not improve the RMSE for the pure GMM estimator. On the contrary, the estimation noise in the instruments adds significantly to the RMSE. The results are more promising for the IDK+GMM estimator, where substantial reductions in RMSE are observed.
In panel D, we have added quadratic and cubic transformations of the lagged dependent variable, and in panel E we have added threshold functions where the threshold depends on percentiles of the data rather than the structural parameter. As shown in panel F, when the RMSE for the pure GMM estimator drops by factors of 3.6–6.6, while the RMSE for the IDK+GMM estimator drops by factors of 2.7–3.4.
6. Concluding Remarks
This paper has shown how the ideas of
Yu and Phillips (
2018) can be adapted to the panel data context with fixed effects. Theoretically, the advantage of the IDK+GMM combination is that the estimator of the threshold parameter is
N-consistent, while the pure GMM estimator converges only at the
-rate. In simulation exercises, we confirmed that the IDK+GMM combination offers a huge practical advantage over pure GMM estimation, even when the former is implemented relatively simply. We also investigated the importance of the choice of instruments and showed that adding fixed nonlinear transformations of the lagged dependent variable can be highly effective when estimating nonlinear equations.
We have focused on the SETAR model in this paper. A more general threshold regression panel data model is
where
is a vector of possibly endogenous variables,
is a possibly endogenous scalar variable, and
,
and
are conformable parameter vectors. It is straightforward to construct an IDK+GMM estimator analog to the SETAR case, and similar efficiency gains are available.
The IDK estimator we have described utilises discontinuities in the conditional expectation function given in (
9). It will fail if
, because then (
9) is continuous. However, in this case the partial derivatives of (
9) may be discontinuous at
or
, so IDK estimation is still possible (e.g.,
Yu and Phillips 2018;
Porter and Yu 2015).
If it is known that
is a smooth function of
y, then we can construct an estimator of
directly based on Equation (
1), without first-differencing and without assumption (
10). Since an extra time period is available for estimation and since we only need to smooth in one dimension (
) instead of two (
) when defining
, this estimator is expected to be more efficient.
For simplicity, we have constructed an overall estimator by taking a simple average of multiple estimators based on separate equations. It is a topic for future research to investigate how best to combine the information. One could consider weighted averages or, instead of averaging separate estimators, one could base an estimator on a (weighted) average over the objective functions. Which is better may depend on e.g., the time pattern of .
Finally, to illustrate the advantage of the IDK+GMM estimator over the pure GMM estimator, our simulations focused on the design considered by
Seo and Shin (
2016). To further investigate the properties of the IDK+GMM estimator in future research, it would be interesting to consider simulation designs where endogeneity is more severe (e.g.,
is correlated with
) and where the number of time periods is smaller (i.e.,
T is small). Also, in practice the optimal weight matrix is not known, and it would be useful to compare two-step estimation of the weight matrix and continuous updating (e.g.,
Hansen et al. 1996).