Monte Carlo Comparison for Nonparametric Threshold Estimators

This paper compares the finite sample performance of three non-parametric threshold estimators via Monte Carlo method. Our results show that the finite sample performance of the three estimators is not robust to the relative position of the threshold level along the distribution of threshold variable, especially when a structural change occurs at the tail part of the distribution. JEL Classification: C14, C21


Introduction
Popularly used to describe structural changes in economic relationships, threshold models have seen many applications, especially in macro fields (e.g., Hansen 2011;Potter 1995).Typical examples include the nonlinearity in public debt to GDP ratio (e.g., Afonso and Jalles 2013;Caner et al. 2010;Cecchetti et al. 2011).A number of threshold estimators for threshold models have been proposed in the literature, and the asymptotic results of these estimators can be categorized into two groups based on different assumptions.The first group is based on the "fixed threshold effect" assumption.The second group imposes a "diminishing threshold effect" assumption introduced by Hansen (2000).For example, it is well known that, for the least-squares estimator, the threshold estimator is super-consistent with the convergence rate n under the "fixed threshold effect" assumption and n 1−2α under the "diminishing threshold effect" assumption, respectively, where α measures the diminishing rate of the threshold effect.
The asymptotic theory and statistical inference have been well developed for the least-squares estimator exogenous regressors and exogenous threshold variable (e.g., Chan 1993;Hansen 2000;Seo and Linton 2007).Recently, there has been a growing interest in studying threshold models with endogenous regressors and/or a threshold variable.Extending the framework of Hansen (2000), Caner and Hansen (2004) applied the two-step least-squares method to estimate threshold models with endogenous slope regressors.In the spirit of the sample selection technique of Heckman (1979), imposing the joint normality assumption, Kourtellos et al. (2016) explored the case that both the threshold variable and slope regressors are endogenous.The work in Seo and Shin (2016) proposed a two-step GMM estimator for a dynamic panel threshold model with fixed effects, which allows endogeneity in both the slope regressors and threshold variable.It is worth noticing that the GMM method allows both a fixed and diminishing threshold effect, and the convergence rate for the GMM threshold estimator is not super-consistent.By relaxing the joint normality assumption of Kourtellos et al. (2016Kourtellos et al. ( , 2017)), a two-step least square estimator based on a nonparametric control function approach to correct the threshold endogeneity was proposed.The semiparametric threshold model separates the threshold effect into two parts, namely the exogenous threshold effect and endogenous threshold bias-correction term.Therefore, with a "small threshold" effect, the convergence rate for the threshold variable depends on diminishing rates of the threshold effect and the bias-correction term.
However, few studies have worked on the estimation and statistical inference of threshold estimators based on nonparametric estimation methods, which do not rely on the least square method.The work in Delgado and Hidalgo (2000) suggested a difference kernel estimator (or DKE), which depends on a chosen point.The convergence rate of Delgado and Hidalgo (2000) DKE is nh d−1 , which depends on both the bandwidth, h, and the dimensionality of regressors in their threshold model, d ≥ 1. Built upon the method of Delgado and Hidalgo (2000), Yu et al. (2018) introduced an integrated difference kernel estimator (or IDKE).The work in Yu et al. (2018) argued that the IDKE can be applied to the case with the endogenous threshold variable.The convergence rate of the IDKE is not related to either the bandwidth or the dimensionality of regressors and is super-consistent with the rate n.Using recently-developed discrete smoothing methods, Henderson et al. (2017) introduced a semiparametric M-estimator of a nonparametric threshold regression model.The threshold estimator of Henderson et al. (2017) can be estimated at the rate √ n/h (h is the bandwidth), which is faster than the parametric convergence rate of √ n.One may notice that the aforementioned convergence rate is the same as that of the smoothed least squares estimator in Seo and Linton (2007).However, they are entirely different.The work in Henderson et al. (2017) focussed on the nonparametric threshold model, and their proposed estimator was based on a non-smooth objective function.On the contrary, Seo and Linton (2007) worked on a linear threshold model, and the proposed estimator was based on a smooth objective function with the indicator function replaced by a CDF-type smooth function.
With many applications and simulations available for comparing the parametric threshold estimators in the literature, little guidance is available for researchers to apply as to the choice of nonparametric threshold estimators.Moreover, to avoid the boundary effect of the threshold estimator, most simulations are designed deliberately with the true threshold level chosen at the middle point of the threshold variable distribution, which can be highly doubted in reality.Therefore, the purpose of this paper is to carefully compare the three nonparametric threshold estimators mentioned above using the Monte Carlo method.More importantly, we consider the case that the true threshold level is not only at the middle, but also at the two tails of the threshold variable distribution.
The rest of the paper is organized as follows.In Section 2, we briefly review the estimation procedure of three nonparametric threshold estimators such as DKE, IDKE and the M-estimator, where threshold models have exogenous regressors and a threshold variable.In Section 3, we illustrate the possible theoretical reason for the conjecture of the poor finite sample performance of the difference kernel-type estimators.Section 4 presents the design of the Monte Carlo simulations.Section 5 reports the finite sample performance.Section 6 concludes.

Three Nonparametric Threshold Estimators
In this paper, we aim to compare the finite sample performance of three nonparametric threshold estimators: Henderson et al. (2017) the semiparametric M-estimator, Delgado and Hidalgo (2000) the difference kernel estimator (DKE) and Yu et al. (2018) the integrated difference kernel estimator (IDKE).
Following Henderson et al. (2017), we consider a generalized threshold regression model: for i = 1, ..., n, where α 0 (•) is an unknown smooth function, X i is a vector of d regressors, q i is the threshold variable, γ 0 is the threshold level, I(•) is the indicator function and β 0 measures the jump size of the regression function at q > γ.Furthermore, X i and q i are both exogenous and may have a common variable.

Semiparametric M-Estimator
If γ 0 is known a priori, Model (1) is known as a partially linear model.The conventional method to estimate the unknown γ 0 is minimizing the sum of squared errors, which can be iterated by the grid search.Therefore, Henderson et al. (2017) suggested the semi-parametric M-estimator of the nonparametric threshold model, which can be obtained in three steps.
In Step 1, given (β, γ), Model (1) becomes a standard nonparametric model.Therefore, we can obtain the Nadaraya-Watson (NW) estimator of α 0 (x) at an interior point, x, i.e., α(x; β, γ) = arg min where is a second order kernel function, h is the bandwidth and d is the dimension of x.
In Step 2, given γ, Model (1) becomes a partially linear model.Then, β 0 can be estimated as: where fh works as the weighting function.The work in Henderson et al. (2017) shows that β(γ) has the following mathematical expression: where we denote I i = I(q i > γ).
In Step 3, we can estimate the threshold level γ 0 by solving the following optimization problem, γ = arg min where w(•) is a weighting function and is application dependent.
As mentioned in Section 1, the convergence rate of the threshold estimator of Henderson et al. ( 2017) is √ n/h, which explodes faster than the usual parametric √ n rate.However, the unknown function α 0 (•) and the jump size β 0 converge at standard nonparametric rates of √ nh d and √ nh, respectively.

DKE and IDKE
Instead of using the absolute value of the weighted average of the sum of errors as the objective function, Delgado and Hidalgo (2000) considered using the difference between Ê[y|x 0 , q = γ−] and Ê[y|x 0 , q = γ+] as the objective function.Ideally, the closer γ approaches the true value, the larger the absolute value of the above difference should be.As a result, we are able to estimate the threshold level by choosing γ, which gives the most considerable gap between the two one-sided expectations.Therefore, the difference kernel estimator (DKE) can be obtained by: γDKE = arg max where we have: if q i is not part of X i , and and k(•) is a second order kernel function.
Obviously, it is reasonable to expect that the DKE estimator is sensitive to the choice of x 0 .Furthermore, the DKE suffers the curse of dimensionality problem as the convergence rate of the DKE, nh d−1 , depends on the dimension of the regressor.To fix these potential weaknesses, Yu et al. (2018) proposed an integrated difference kernel estimator, which allows γ not to rely on the single choice in x 0 , but the expectation of all X.The γIDKE can be derived as follows: where: ) is defined the same as above.The IDKE is super-consistent with convergence rate n.The work in Yu et al. (2018) showed that IDKE is consistent even if the threshold variable is endogenous.They explain that the role of the instruments of the endogenous regressors and the endogenous threshold variable is improving only the efficiency of the IDKE.

Estimation Difficulties in the Difference Kernel-Type Estimator with Near Boundary γ 0
In this section, we use a simple version of Model (1) to explain the estimation difficulties of the difference kernel-type estimators when γ 0 lies at the tails of the threshold variable distribution.This estimation difficulty motivates us to investigate the position effect of the true threshold level on the finite sample performance.Specifically, we consider the true model as: where X i is randomly drawn from a uniform distribution over the interval of [−0.5, 0.5] for i = 1, ..., n.
The model above can be regarded as Model (1) with α 0 (x) ≡ 0, β 0 = 1, and ε i = 0 for all i = 1, ..., n.Therefore, the DKE is based on the objective function: Letting u x = (X i − γ) /h and applying the change of variables, we have the probability limit of Qn (γ) equal to: where h is the bandwidth.If γ < γ 0 , we obtain: and: where the positive sign follows for all γ 0 < 0.5 for any second-order kernel function with a bell shape.
It is worth noting that as γ 0 approaches 0.5 from the left side, the difference between k( γ 0 −γ h ) − k( 0.5−γ h ) becomes smaller.As a result, for all γ, the above derivative goes to zero, which makes the objective function flat and leads to the estimation difficulty.
Similarly, if γ > γ 0 , we have: and: where the negative sign follows for all γ 0 > −0.5 for any second-order kernel function with a bell shape.Therefore, we observe that as γ 0 approaches −0.5 from the right side, for all γ, the difference between )du x becomes smaller, which makes the derivative go to zero, and this results in a flat objective function.In summary, the DKE is asymptotically consistent with γ 0 ∈ (−0.5, 0.5).However, it is reasonable to suspect that DKE may have poor finite performance with the true threshold level lying at the tails of the threshold variable distribution due to the estimation difficulty of the flat objective function.
Next, we assume that there are additional covariates, Z i , which are randomly drawn from uniform distribution over the interval of [−0.5, 0.5], for all i = 1, ..., n, and {X i } and {Z i } are independent.Therefore, the probability limit of the objective function of the IDKE is (with the same bandwidth): where u z = Z i −z 0 h .Note that: Consequently, in this typical example, ∂Q n (γ) IDKE ∂γ can be interpreted as a rescaled ∂Q n (γ) DKE ∂γ , which implies the IDKE will suffer the same boundary problem as the DKE estimator.

Monte Carlo Designs
To assess the finite sample performance of the three nonparametric threshold estimators, we consider seven data-generating mechanisms, which are similar to those studied in Henderson et al. (2017); Yu et al. (2018).
To examine the position effect of the true threshold level on the finite sample performance, we set γ 0 at different segments of the threshold variable distribution.Specifically, we set the true threshold, γ 0 , as the pth quantile of the threshold variable with p = 25, 50 and 75 to place the true threshold level to the left tail, middle and the right tail of the threshold variable distribution, respectively.We set x 0 = x max for the DKE estimate of Delgado and Hidalgo (2000), where x max is the data with the greatest empirical density among all generated x i 's for each simulation of each DGP. 2 We use the rule of thumb bandwidth, h = C σx n −1/(d+4) , where C = 4 d+2 1 d+4 , d is the dimension of x i and σx is the sample standard deviation of {x i }.We use the Gaussian kernel function.As suggested by 1 With the uniform distribution, the intensity of the Poisson process would not change with the change in the true threshold location.Therefore, the limiting distribution of both the DKE and the IDKE is not affected given γ 0 is not on the boundary of Θ γ .

2
The theoretical density should be the same for all x due to the uniform distribution.The reason we use the data-driven choice of x 0 is because we do not know the true density in reality.Yu et al. (2018), we use the one-sided rescaled Epanechnikov kernel with k − (q, 0) = 3 4 (1 − q 2 )I(q < 0) and k + (q, 0) = k − (−q, 0) to estimate the DKE and the IDKE.
We repeat 2000 times for each simulation. 3We set the sample size n = 100, 300 and 500.For each simulation, we report the average bias, mean squared error (or MSE) and the standard deviation (or stdev) of the threshold estimates.Tables 1-7 contain the details of the simulation results.
Table 8 shows the realized convergence rate of the semi-parametric M-estimator of Henderson et al. (2017) and IDKE of Yu et al. (2018).(2017), the DKE of Delgado and Hidalgo (2000) and the IDKE of Yu et al. (2018) for the simple jump function defined as Equation ( 17).The first column gives the sample size that the simulation used.The third to fifth columns report the average bias.The sixth to eighth columns give the mean squared errors of the threshold estimates.
The last three columns present the standard deviations.This table reports the simulation results of three estimators, the semiparametric M-estimator of Henderson et al. (2017), the DKE of Delgado and Hidalgo (2000) and the IDKE of Yu et al. (2018) for the univariate threshold periodic model defined as Equation ( 19).The first column gives the sample size that the simulation used.The third to fifth report propose the average bias.The sixth to eighth columns give the mean squared errors of the threshold estimates.The last three columns present the standard deviations.This table reports the simulation results of three estimators, the semiparametric M-estimator of Henderson et al. (2017), the DKE of Delgado and Hidalgo (2000) and the IDKE of Yu et al. (2018) for the univariate threshold quadratic model defined as Equation ( 20).The first column gives the sample size that the simulation used.
The third to fifth report propose the average bias.The sixth to eighth columns give the mean squared errors of the threshold estimates.The last three columns present the standard deviations.This table reports the simulation results of three estimators, the semiparametric M-estimator of Henderson et al. (2017), the DKE of Delgado and Hidalgo (2000) and the IDKE of Yu et al. (2018) for the multivariate linear threshold model defined as Equation ( 21).The first column gives the sample size that the simulation used.The third to fifth report propose the average bias.The sixth to eighth columns give the mean squared errors of the threshold estimates.The last three columns present the standard deviations.This table reports the simulation results of three estimators, the semiparametric M-estimator of Henderson et al. (2017), the DKE of Delgado and Hidalgo (2000) and the IDKE of Yu et al. (2018) for the multivariate threshold quadratic model defined as Equation ( 22).The first column gives the sample size that the simulation used.
The third to fifth columns report the average bias.The sixth to eighth columns give the mean squared errors of the threshold estimates.The last three columns present the standard deviations.This table reports the simulation results of three estimators, the semiparametric M-estimator of Henderson et al. (2017), the DKE of Delgado and Hidalgo (2000) and the IDKE of Yu et al. (2018) for the multivariate threshold periodic model defined as Equation ( 23).The first column gives the sample size that the simulation used.The third to fifth columns report the average bias.The sixth to eighth columns give the mean squared errors of the threshold estimates.The last three columns present the standard deviations.This table reports the realized convergence rates of the semiparametric M-estimator of Henderson et al. (2017) and the IDKE of Yu et al. (2018).The realized convergence rates are shown as the coefficient estimate by regressing the logarithm of RMSE on the logarithm of the sample size for each DGP.Samples sizes used are n = 100, 200, 300, 400, 500, 600 and 700.

Monte Carlo Results
For the semi-parametric M-estimator introduced by Henderson et al. (2017), our results show that the performance was slightly affected by the position of the true threshold level.Meanwhile, as the sample size increased, this position effect gradually vanished. 4In addition, we observed that the bias was smaller for multivariate models than univariate models.Using the bandwidth as defined in Section 4, which behaved roughly as O(n −1/5 ) for univariate models and O(n −1/7 ) for multivariate models, the theoretical convergence rates were O(n −1.2 ) and O(n −1.14 ) accordingly.From Table 8, the super-consistency was confirmed with the estimated convergence rate of γ.Consistent with the theory, the realized convergence rate decreased as the dimension increased.It is quite interesting that, for almost all univariate models, the realized convergence rate of γ was faster when γ 0 was at the leftand right-tail position than when γ 0 was at the median position.However, for multivariate models, the realized rates seemed to be stable with the position of γ 0 .
For the DKE, as we conjectured, it was severely affected by the position of the true threshold value for all DGPs, which may result from the estimation difficulties, as we argued in Section 3. Furthermore, even with the middle-positioned γ 0 , the bias still showed a non-decreasing pattern with the increasing sample size under some multivariate specifications. 5Intuitively, this may result from the choice of x 0 , which distorts the result by providing useless information.According to the comment in the Supplementary Material of Yu et al. (2018), the choice of x 0 is crucial in identifying the DKE estimator.On the one hand, the optimal x 0 should make [E(y|x 0 , q = γ − 0 ) − E(y|x 0 , q = γ + 0 )] 2 as large as possible.On the other hand, one needs the conditional density f (x 0 |q = γ 0 ) to be large enough to provide sufficient information.Therefore, theoretically, with a uniform distribution and univariate linear threshold model as in DGP2, the ideal x 0 should be at the middle of its distribution with the value of zero.However, in the simulation, we set x 0 equal to the value with the largest empirical density, which may appear at the two tails.This may lead to [E(y|x 0 , q = γ − 0 ) − E(y|x 0 , q = γ + 0 )] 2 approaching zero.Moreover, with the multivariate and nonlinear specification, we can expect more distortion involved.As a result, the DKE performs the worst among all three competitors for all DGPs.
For the IDKE, our results reveal several features.Firstly, the IDKE was affected by the position of the actual threshold value.The influence was not as substantial as the DKE.Indeed, the integration allowed more local information to be used and alleviated the possible distortion due to the choice of x 0 .Surprisingly, unlike the DKE, this position effect seemed to be asymmetric for the IDKE.For most of the DGPs, we observed that the absolute value of the average bias and MSE was larger with the left-tailed γ 0 than the right-tailed γ 0 .The theoretical convergence rate of the IDKE estimator, n, is not related to either the bandwidth or the dimension, which is faster than the semi-parametric M-estimator of Henderson et al. (2017).This is consistent with our realized convergence rates, which are shown in Table 8.Moreover, for all DGPs, the realized convergence rates were faster with two-sided tailed γ 0 than the median γ 0 .
In summary, the simulation results give some evidence that the finite sample performances were affected by the position of the true threshold level for all three nonparametric threshold estimators.However, this effect was heterogeneous.The position effect least influenced the semi-M estimator of Henderson et al. (2017).Meanwhile, the difference kernel-type estimators were severely distorted by the tailed γ 0 , which confirms our conjecture made in Section 3. Furthermore, our results show that the position of the true threshold level also affects the realized convergence rate.We also found, for the semi-M estimator of Henderson et al. (2017) and the IDKE estimator, the tail distortion tended to be reduced in multivariate models.
As a robustness check of our findings, Figures 1-4 show the simulation results of DGP 2 and DGP 5 with γ 0 taking different positions along the threshold variable distribution.It is obvious that, for all figures, semi-M had lower average bias in absolute value than difference kernel-type estimators with tail γ 0 .Furthermore, we found the gap between the average bias of the semi-M estimator and the 4 With n = 100, the bias, MSE and standard deviation were larger with γ 0 placed at two tails and γ 0 placed at the median point.However, with n = 500, there was no apparent difference between tail position γ 0 estimation and the median position γ 0 estimation.average bias of the difference kernel-type estimators to drop greatly with γ 0 approaching the middle position of the threshold variable distribution.(5th, 10th, 20th, 40th, 50th, 60th, 80th, 90th, 95th).The simulation is based on DGP 2. The sample size is 300.(5th, 10th, 20th, 40th, 50th, 60th, 80th, 90th, 95th).The simulation is based on DGP 5.The sample size is 100.5th, 10th, 20th, 40th, 50th, 60th, 80th, 90th, 95th).The simulation is based on DGP 5.The sample size is 300.

Conclusions
In this paper, we evaluated the finite sample performance of three non-parametric threshold estimators and identified the relationship between the performances of different estimators and the position of the true threshold level with Monte Carlo methods.
The study shows, with all three estimators affected by the tail position of the true threshold value, that the semi-M estimator of Henderson et al. (2017) outperformed DKE and IDKE for roughly all DGPs considered in the paper.Interestingly, there appears to be some evidence that the distortion can be reduced if there are other covariates besides the threshold variable for the semi-M estimator and the IDKE.Consistent with the theory, we find that the realized convergence rates support the super-consistency in the threshold estimate for all three estimators.However, we find that the realized convergence rates are also affected by the position of the true threshold value.We therefore conclude that, in applied works, using the difference kernel-type estimation, researchers must be careful when the threshold estimate is at the left-tail or the right-tail of the threshold variable distribution.

Figure 1 .
Figure 1.Average bias with γ 0 as various quantiles of the threshold variable, DGP 2, n = 100.This figure shows absolute values of the average bias with the true threshold level being several quantiles of the threshold variable (5th, 10th, 20th, 40th, 50th, 60th, 80th, 90th, 95th).The simulation is based on DGP 2. The sample size is 100.

Figure 2 .
Figure 2. Average bias with γ 0 as various quantiles of the threshold variable, DGP 2, n = 300.This figure shows absolute values of the average bias with the true threshold level being several quantiles of the threshold variable(5th, 10th, 20th, 40th, 50th, 60th, 80th, 90th, 95th).The simulation is based on DGP 2. The sample size is 300.

Figure 3 .
Figure 3. Average bias with γ 0 as various quantiles of the threshold variable, DGP 5, n = 100.This figure shows absolute values of the average bias with the true threshold level being several quantiles of the threshold variable(5th, 10th, 20th, 40th, 50th, 60th, 80th, 90th, 95th).The simulation is based on DGP 5.The sample size is 100.

Figure 4 .
Figure 4. Average bias with γ 0 as various quantiles of the threshold variable, DGP 5, n = 300.This figure shows absolute values of the average bias with the true threshold level being several quantiles of the threshold variable (5th, 10th, 20th, 40th, 50th, 60th, 80th, 90th, 95th).The simulation is based on DGP 5.The sample size is 300.

0 Is the 75th Quantile of the Threshold Variable
This table reports the simulation results of three estimators, the semiparametric M-estimator ofHenderson et al.

γ 0 Is the 25th Quantile of the Threshold Variable
3All programming is finished in Matlab.

γ 0 Is the 50th Quantile of the Threshold Variable
Yu et al. (2018)lgo (2000)ulation results of three estimators, the semiparametric M-estimator ofHenderson et al. (2017), the DKE ofDelgado and Hidalgo (2000)and the IDKE ofYu et al. (2018)for the univariate linear threshold model defined as Equation (18).The first column gives the sample size that the simulation used.The third to fifth columns report the average bias.The sixth to eighth columns give the mean squared errors of the threshold estimates.The last three columns present the standard deviations.

Table 8 .
Estimated convergence rate of the nonparametric threshold estimators.