Next Article in Journal
Value-at-Risk and Models of Dependence in the U.S. Federal Crop Insurance Program
Next Article in Special Issue
Nonparametric Approach to Evaluation of Economic and Social Development in the EU28 Member States by DEA Efficiency
Previous Article in Journal
The Effects of the Financing Facilitation Act after the Global Financial Crisis: Has the Easing of Repayment Conditions Revived Underperforming Firms?
Previous Article in Special Issue
Growth and Debt: An Endogenous Smooth Coefficient Approach
Article Menu
Issue 2 (June) cover image

Export Article

J. Risk Financial Manag. 2019, 12(2), 64; https://doi.org/10.3390/jrfm12020064

Article
Smoothed Maximum Score Estimation of Discrete Duration Models
1
Nanyang Business School, Nanyang Technological University, Singapore 639798, Singapore
2
Department of Economics, York University, Toronto, ON M3J 1P3, Canada
*
Author to whom correspondence should be addressed.
Received: 7 March 2019 / Accepted: 9 April 2019 / Published: 15 April 2019

Abstract

:
This paper extends Horowitz’s smoothed maximum score estimator to discrete-time duration models. The estimator’s consistency and asymptotic distribution are derived. Monte Carlo simulations using various data generating processes with varying error distributions and shapes of the hazard rate are conducted to examine the finite sample properties of the estimator. The bias-corrected estimator performs reasonably well for the models considered with moderately-sized samples.
Keywords:
maximum score estimator; discrete duration models; efficient semiparamteric estimation

1. Introduction

Parametric discrete-time duration models are used extensively within econometrics and the other statistical sciences. Since misspecification of these models can lead to invalid inferences, a variety of semiparametric alternatives have been proposed. However, even these alternative semiparametric estimators exploit certain smoothness and moment conditions, which may be untenable in some circumstances. To address these shortcomings, we propose a new estimator, based on Horowitz (1992)’s smoothed maximum score estimator of single-period binary choice models, which relaxes these assumptions. To motivate and contextualize this estimator, we use this Introduction to review the relevant literature on discrete duration and binary choice models and indicate how our proposed estimator fills a gap in the literature.
In econometrics, discrete-time duration models are typically framed as a sequence of binary choices. The probability of remaining in a state at time s (the continuation probability) is denoted F s ( β 0 ) , and the hazard rate is simply h s ( β 0 ) = 1 F s ( β 0 ) . Many parametric forms have been employed for the hazard rate in these models including extreme value, logistic, normal and other parsimonious specifications. Examples using a logistic specification include: Huff-Stevens (1999), Finnie and Gray (2002), Bover et al. (2002) and D’Addio and Rosholm (2005); normal distribution: Meghir and Whitehouse (1997) and Chan and Huff-Stevens (2001); extreme value (also known as the complementary log-log model): Baker and Rea (1998), Cooper et al. (1999), Holmas (2002), Fennema et al. (2006) and Gullstrand and Tezic (2008). These and others were reviewed in Allison (1982) and Sueyoshi (1995). Hess (2009) has suggested using the generalized Pareto distribution, which nests the extreme value and logistic distributions. These specifications lead naturally to maximum likelihood estimation of β 0 , although it is useful to note that there are alternative ways to estimate β 0 including nonlinear regression, treating F s ( β 0 ) as a conditional mean. As with any parametric approach, misspecification of the hazard rate can lead to invalid inferences. In this regard, we consider various relevant semiparametric alternatives, which relax the parametric assumptions.
We note first that semiparametric estimation of continuous-time models has been the focus of substantial research in the discipline. Numerous authors have developed distribution theory for semiparametric estimation of various continuous-time duration models including Horowitz (1999), Nielsen et al. (1998), Van der Vaart (1996) and Bearse et al. (2007). While these and other semiparametric estimators allow for the relaxation of some parametric assumptions associated with continuous-time duration models, they are not generally appropriate when the duration random variable has a discrete distribution.
We adopt the standard approach in econometrics of constructing the continuation probability from an underlying latent regression structure. In a standard single-period basic binary choice model, we would observe Y = 1 [ Y * 0 ] with Y * = Z + U where 1 [ · ] is the usual indicator function, Z is an index function of observable random variables and unknown parameters and U has a distribution function F. With discrete-time duration models, the observed duration is the sum of a sequence of indicators so that T = s = 1 S Y s , where Y s = Y s 1 1 [ Z s + U s > 0 ] with Y 0 = 1 , and the distribution function of U s is denoted by F s .
There is a large literature on semiparametric estimation of single-period binary choice models. We briefly review this, highlighting how it has been adapted for certain multivariate discrete choice and/or discrete-duration models and finally how our proposed estimator fills a gap in this research. Since in some cases, the conditional mean of Y in the single-period case can be written as F ( β 0 ) , the parameter of interest, β 0 , can be estimated from a semi-parametric regression. This was suggested by Ichimura (1993) to obtain a N -consistent estimator of β 0 . With respect to duration models and exploiting the fact that F s can also be written as the conditional mean of the choice variable, Reza and Rilstone (2014) minimized a sum of squared semiparametric residuals to estimate the parameters of interest. In a similar vein, Klein and Spady (1993) developed a semi-parametric maximum likelihood estimator of β 0 with the single observation likelihood function written as l ( β ) = F ( β ) Y ( 1 F ( β ) ) 1 Y . Klein and Spady’s (1993) estimator essentially consists of replacing F with a nonparametric conditional mean function. Reza and Rilstone (2016) adapted Klein and Spady’s (1993) estimator to the discrete duration case. They also derived the efficiency bounds and showed that their estimator obtained these bounds. We note that the approaches in Ichimura (1993) and Klein and Spady (1993) require continuity of F in the underlying covariates and are limited with respect to the forms of allowable heteroskedasticity (for example, heteroskedasticity from time-varying parameters is precluded). Another problem is simply that identification may not be possible under the mean-independence restriction that [ U | Z ] = 0 .1 By extension, the estimators of Reza and Rilstone (2014, 2016) suffer the same shortcomings as applied to duration models.
With respect to single-period binary choice models, Manski’s (1975, 1985) Maximum Score (MS) estimator circumvents these limitations using simply the median-independence restriction that Median [ U | Z ] = 0 . The MS estimator can be written as the maximizer of:
Ψ N * ( β ) = 1 N i = 1 N ( 2 Y i 1 ) 1 [ Z i ( β ) > 0 ]
where Z i ( β ) is an index function of the observable covariates. As is usually the case, a normalization of β is necessary. For the estimator to be consistent, a few restrictions need to be imposed, in particular with respect to the distribution of U. The shortcomings of the estimator are that it is only N 1 / 3 -consistent, and its asymptotic distribution, a form of Brownian motion, is not amenable for use in the applied work.
From one perspective, the shortcomings of the MS estimator derive from its use of the non-differentiable indicator function. Horowitz (1992) largely circumvented its limitations in this regard by replacing the indicator function with a smoothed indicator function, K ( Z i ( β ) / γ ) . The objective function for the Smoothed Maximum Score (SMS) estimator is:
Ψ N ( β ) = 1 N i = 1 N ( 2 Y i 1 ) K ( Z i ( β ) / γ ) .
The SMS is typically better than N 1 / 3 -consistent, but slower than N , the speed of convergence depending on the smoothness of K and the distribution of the random components of the model. Note that the N -convergence of the estimators such as Klein and Spady’s (1993) is linked to the manner in which they use kernels. These estimators are a form of double averages. However, the objective functions for MS and SMS are nonparametric point estimators, which are single averages. With some caveats, the SMS estimator reflects the fact that the only exploitable information is at or close to the median of the U’s. The N estimators effectively use all the data points.
The main objective of this paper is to show how to extend SMS to estimate discrete duration models. The MS and SMS estimators have been used in other situations such as Lee (1992) and Melenberg and Van Soest (1996), who extended the MS and SMS, respectively, to ordered-response models. De Jong and Woutersen (2011) have extended the SMS estimator to binary choices with dynamic time series data. Fox (2007) adapted the MS estimator to multinomial choices. Charlier et al. (1995) extended the SMS to panel data. Other researchers have modified the MS and SMS estimators to improve their sampling properties. Kotlyarova and Zinde-Walsh (2010) suggested using a weighted average of different SMS estimators to reduce mean squared error. Iglesias (2010) derived the second-order bias, which can be used to reduce the bias of the SMS estimator. Jun et al. (2015) proposed a Laplace estimator alternative to improve on the N 1 / 3 -consistency of the MS estimator. To our knowledge, neither the MS nor SMS estimators have been extended to duration models.
Section 2 and Section 3 discuss the class of models considered and present the basic estimator along with its main asymptotic properties. Section 4 provides some simulation results concerning the sampling distribution of the estimator, and Section 5 concludes.

2. Modelling

As mentioned, a standard approach for modelling a discrete duration process is to construct it as a sequence of binary choice models, with observed and unobserved heterogeneity. The standard binary choice model is adapted such that in each time period, s, a choice is made by individual i to continue in a state if the latent variable:
Y i s * = Z i s ( β 0 ) + U i s , s = 1 , 2 , , S
is greater than zero. Here, Z i s ( β ) = X i s * + X i s β 2 is an index where X i s * is a scalar random variable and X i s is a k × 1 vector, which may include a function of s, while β is a k × 1 vector of constants.
We assume the U i s ’s and X i s * , X i s ’s are jointly i.i.d. We observe Y i s = 1 [ Y i s * > 0 ] Y i s 1 and X i s * , X i s , s = 1 , , S . A natural adaptation of Manski’s setup is the additional assumption that Median [ U s | X s , Y s 1 ] = 0 , s = 1 , , S . We estimate the parameters by effectively estimating the density of Z i s ( β 0 ) at zero by nonparametric methods. For notational convenience, we often suppress the i subscripts. Another way to view the modelling is that in any given period s with Y s 1 = 1 , this is a standard binary choice variable with the key difference being that the index Z is a function of some covariates and the number of completed periods, s. The duration variable for period s is simply T s = j = 0 s 1 Y j with Y 0 = 1 , Y S + 1 = 0 .3 The evolution of the Y s ’s, conditional on the covariates and duration, is given by:
Y s = 1 [ Z s ( β 0 ) + U s 0 ] Y s 1 , s = 1 , , S .
Note that this representation is such that Y s is zero if the subject left the state prior to period s and becomes a standard binary choice model in period s if the subject elected to continue in the state in period s 1 .
We put an upper limit, S, on the length of spells. This is common in empirical work.4 Allowing for unbounded S introduces technical difficulties that are not readily resolved. Put Z s = { X i j * , X i j , Y i , j 1 } j = 1 s . It is useful to note that by iterated expectations:
E [ Y s | Z s ] = E [ Y s | Z s ( β 0 ) , Y s 1 ] = F s Y s 1
so that, tautologically, F s , the continuation probability function, is:
F s = E [ Y s | Z s ( β 0 ) , Y s 1 = 1 ] = Pr [ Y s = 1 | Z s ( β 0 ) , Y s 1 = 1 ] .

3. The Estimator

Adapting the SMS estimator to the discrete duration model as outlined in Section 2, the objective function is:
Ψ N ( β ) = 1 N i = 1 N s = 1 S Y i s 1 ( 2 Y i s 1 ) K ( Z i s ( β ) / γ ) .
K ( w ) , a smoothed indicator function, is the anti-derivative of K ( w ) = d K ( w ) / d w and has the properties: | K ( w ) | M < , lim w K ( w ) = 0 , lim w K ( w ) = 1 . In most kernel density estimation, K is a density function and K is its associated cumulative distribution function. The technical requirements here sometimes require use of a higher order kernel.
Note that the objective function is of the same form as the usual SMS estimator with the modifications that there is a double summand over individuals and time periods and each of the summands at period s is multiplied by Y s 1 , so that after exit, there is no further contribution to the objective function by that individual.
Implicitly, we impose the identification condition that the coefficient on X i s * is unity5 (e.g., Li and Racine 2007). Horowitz (1992) discussed the identification issue. X i s * is assumed to have a continuous distribution, conditional on X i s and Y i s 1 . Let:
Y i = Y i 1 Y i S , X i = X i 1 X i S , X i * = X i 1 * X i S * , Z i = Z i 1 Z i S .
The estimator solves the first-order conditions ψ N ( β ^ ) = 0 , which are given by:
ψ N ( β ) = 1 N i = 1 N q i ( β ) , q i ( β ) = s = 1 S q i s ( β ) , q i s ( β ) = Y i s 1 ( 2 Y i s 1 ) 1 γ K ( Z i s ( β ) γ ) X i s .
Concerning notation, when a function’s argument β is suppressed, it is evaluated at β 0 , e.g., q i = q i ( β 0 ) . q i ( 1 ) ( β ) = q i ( β ) / β , a k × k matrix. Thus,
ψ N ( 1 ) ( β ) = 1 N i = 1 N q i ( 1 ) ( β ) , q i ( 1 ) ( β ) = s = 1 S q i s ( 1 ) ( β ) , q i s ( 1 ) ( β ) = Y i s 1 ( 2 Y i s 1 ) 1 γ 2 K ( 1 ) ( Z i s ( β ) γ ) X i s X i s .
G ( u s | z s , x s , y s 1 ) and g ( u s | z s , x s , y s 1 ) denote the cumulative distribution and density functions of U s conditional on Z s , X s , Y s 1 = 1 , and f ( z s | x s , y s 1 ) denotes the density functions of Z s conditional on X s , Y s 1 . The superscript [ j ] indicates the j th derivative of a function with respect to z s , and in particular, we have G [ j ] ( z s | z s , x s , y s 1 ) = d j G ( z s | z s , x s , y s 1 ) / d z s j . 0 M < is a generic constant. Put:
B = 2 μ m m ! E s = 1 S j = 1 m m j G [ j ] ( 0 | 0 , X s , Y s 1 ) f [ m j ] ( 0 | X s , Y s 1 ) X s Y s 1 , C = E s = 1 S f ( 0 | X s , Y s 1 ) X s X s Y s 1 K ( w ) 2 d w , Q = 2 E s = 1 S G [ 1 ] ( 0 | 0 , X s , Y s 1 ) f ( 0 | X s Y s 1 ) X s X s Y s 1 .
Let Pr [ u s , x s , x s * | Z s 1 ] denote the probability distribution of U i s , X i s , X i s * given Z i , s 1 . The distributional assumptions we make are as follows.
Assumption 1.
{ Y i , X i , X i * } i = 1 N is a random sample where Y i s = 1 [ Z i s ( β 0 ) + U i s 0 ] Y i s 1 . Pr [ u s , x s , x s * | Z s 1 ] = Pr [ u s , x s , x s * | Y s 1 ] . Z i s ( β ) = X i s * + X i s β . Y i 0 = 1 for all i.
Assumption 2.
For s = 1 , , S , (a) the support of the distribution of x s * , x s is not contained in any proper linear subspace of R k + 1 , (b) 0 < Pr ( y s = 1 | x s * , x s , y s 1 = 1 ) < 1 for almost every x s * , x s and (c) for almost every x s , y s 1 , the distribution of x s * conditional on x s , y s 1 has everywhere positive density with respect to the Lebesgue measure.
Assumption 3.
Median ( u s | x s * , x s , Y s 1 ) = 0 for almost every x s , Y s 1 , s = 1 , , S .
Assumption 4.
β 0 B , a compact subset of R k .
Assumption 5.
The elements of X s have finite fourth moments, s = 1 , , S .
Assumption 6.
( log N ) / ( N γ 4 ) 0 as N
Assumption 7.
(a) K is twice differentiable everywhere; K and K [ 1 ] are uniformly bounded; and each of the following integrals over ( , ) is finite: K ( w ) 4 d w , [ K [ 1 ] ( w ) ] 2 d w , l w 2 K [ 1 ] ( w ) | d w . (b) For some integer m > 2 and each integer j, j = 2 , , m 1 w j K ( w ) d w = 0 , w m K ( w ) d w = μ m , | μ m | < . (c) For j = 2 , , m 1 , γ 0 , any η > 0 , γ j m | γ w | > η | w j K ( w ) | d w 0 , γ 1 | γ w | > η | K [ 1 ] ( w ) | d w 0
Assumption 8.
f ( z s | x s , y s 1 ) is m-times continuously differentiable with respect to z in a neighbourhood of zero, almost every x s , y s 1 , and | f [ j ] ( z s | z s , x s , y s 1 ) | M , s = 1 , , S .
Assumption 9.
G ( z s | z s , x s , y s 1 ) is m-times continuously differentiable with respect to z s in a neighbourhood of zero, almost every x s , y s 1 and | G [ j ] ( z s | z s , x s , y s 1 ) | < M , j = 1 , , m , s = 1 , , S .
Assumption 10.
β 0 is an interior point of B .
Assumption 11.
Q is negative definite.
These assumptions adapt those in Horowitz (1992) to allow for the dependency structure. They also embed Manski’s (1985) assumptions with S = 1 . Notice that the random sampling assumption refers to N random draws within each being the potentially S observations.
Identification (see Proof of Proposition 1 in Appendix A) follows by adapting Manski’s (1985) proof for the MS estimator. Of interest here is that we wish to allow for time dependence. Note that for the MS/SMS case, nothing precludes the inclusion of a constant in the index so long as, say, x s is not co-linear6 (in fact, simulation and empirical results such as in Horowitz (1998) indicate good results for intercept estimates). For the m-multinomial choice model, Lee (1992) included m non-stochastic threshold parameters (including a constant). In our case, the same applies for including certain non-stochastic functions of s in x s , such as including indicators for each s or a polynomial in s. For parsimony in our numerical/empirical work, we have included quadratics to allow for increasing, decreasing and non-monotonic time dependency. This allows for straight-forward testing. In this regard, we note that the semiparametric information matrix derived in Reza and Rilstone (2016) was singular for this class of models. There is no contradiction here, since the singularity indicates that those parameters are not estimable at the N -rate; it does not imply that they cannot be identified or estimated at a less than N -rate, which we do here.
We have the following lemma, which permits simple derivation of the asymptotic properties of the estimator.
Lemma 1.
Let Assumptions 1–11 hold. Then, (a) E [ q i ( 1 ) ( β 0 ) ] = Q + o ( 1 ) , (b) γ m E [ q i ( β 0 ) ] = B + o ( 1 ) and (c) γ E [ q i ( β 0 ) q i ( β 0 ) ] = C + o ( 1 ) .
The asymptotic distribution of the estimator can be summarized easily using the following result.
Proposition 1.
Let Assumptions 1–11 hold. Then, (a) β ^ is consistent and (b) N γ ( β ^ β 0 γ m Q 1 B ) d N ( 0 , Q 1 C Q 1 ) .
The proofs are in Appendix A. In the statement of the proposition, note the presence of the first-order bias, γ m Q 1 B , for which it may be advisable to adjust the raw estimator. One of the benefits of this estimator is that one can effectively ignore the dependence of the observations, pool all the observations across individuals for whose Y i , s 1 = 1 and use standard SMS optimization procedures. This is what we have done in the simulations. Reza and Rilstone’s (2016) setup (extension of Klein and Spady 1993) allows for estimation of the hazard rate, 1 F s , with a natural estimate of time dependence from the semiparametric estimates of Δ h s = F s 1 F s . Note that Reza and Rilstone’s (2016) estimator of Δ h s only has a N γ -rate of convergence.
As for the SMS estimator, we can consider the optimal choice of window width. As with Horowitz (1992), we consider choices that minimize an MSE criterion. Therefore, if we consider that the asymptotic results correspond to the distribution of a random variable, say W, with mean γ m Q 1 B and variance Q 1 C Q 1 / ( N γ ) , we can consider minimizing, say, the inner product MSE of Ω 1 / 2 W , where Ω is a positive definite weighting matrix, i.e., minimize E [ W Ω W ] with respect to γ . This results in:
γ * = arg min M S E ( γ ) , M S E ( γ ) = γ 2 m B Q 1 Ω Q 1 B + 1 N γ Trace [ Ω Q 1 C Q 1 ]
γ * = N 1 / ( 2 m + 1 ) ( Trace [ Ω Q 1 C Q 1 ] 2 m B Q 1 Ω Q 1 B ) 1 / ( 2 m + 1 ) .
For inferences it is necessary to obtain consistent estimates of the components of the first-order bias and variance. These cannot be directly estimated as they depend on the distribution of the unobservable U’s. However, by extension of the arguments in Horowitz (1992), they may be obtained through various derivatives of the objective function. Specifically, put:
B ^ ( β ^ ) = 1 γ m ψ N ( β ^ ) , Q ^ ( β ^ ) = ψ N ( 1 ) ( β ^ ) C ^ ( β ^ ) = 1 N γ i = 1 N s = 1 S q i s ( β ^ ) X i s K ( Z i s ( β ^ ) / γ ) .
By the uniform law of large numbers, B ^ ( β ^ ) p B , Q ^ ( β ^ ) p Q and C ^ ( β ^ ) p C .
It is well known that the first-order asymptotic results may provide a poor approximation to the sampling distribution of the SMS estimator. Thus, it may be preferable to use some higher order method to approximate the distribution. Apart from Iglesias (2010) who applied the results in Rilstone et al. (1996) to derive the second-order bias of β ^ , little is known (explicitly) about the second-order properties of the SMS estimator. Estimates can be bootstrapped. In this regard, we note that one should resample individuals. That is, bootstrap estimates should be based on resamples: { Z i S * } i = 1 N , where the *’s indicate random draws from the original data. Horowitz (2002) documents some of the issues associated with bootstrapping the distribution of β ^ . In particular, the corresponding re-estimates: β ^ j * , say, and corresponding standard errors should be calculated using an under-smoothing window-width such as γ [ . 5 γ * , γ * ] .

4. Simulation Exercise

To examine the estimator’s performance in finite samples, we conducted Monte Carlo simulations with several Data Generating Processes (DGPs). We adapted simulations in Horowitz (1992) by augmenting the models with duration dependence, and a variety of error distributions. The latent processes we considered included those with homoskedastic errors:
Y i s * = 1.5 + 2 ( s / 100 ) ( s / 100 ) 2 + X 1 i s + X 2 i s u i s , u i s N ( 0 , 1 )
and those with heteroskedastic errors:
Y i s * = 1.5 + 2 ( s / 100 ) ( s / 100 ) 2 + X 1 i s + X 2 i s v i s , v i s = 0.25 ( 1 + ( X 1 i s + X 2 i s ) 2 ) · u i s , u i s N ( 0 , 1 ) .
We conducted the simulations for two sample sizes, N = 500 and N = 1000 . The X’s were drawn as i.i.d. N ( 0 , 1 ) . For the DGP with homoskedastic normal errors, this resulted in duration times with averages of 5.7 ( N = 500 , 1000 ) and standard deviations also 5.7 ( N = 500 , 1000 ). With heteroskedastic errors, the average duration times were 8.7 ( N = 500 , 1000 ) with standard deviations of 9.6 ( N = 500 ) and 9.5 ( N = 1000 ). For identification purposes, the coefficient on X 1 was normalized to one, and our key parameter of interest was the coefficient on X 2 , with a true value of one. We conducted 500 replications for each specification. We followed Horowitz (1992) to estimate the parameters in two steps: first using simulated annealing to find the approximate maximizer of Ψ N ( β ) followed by gradient methods for greater precision. We then used the bias correction described in the previous section to bias-adjust the parameter estimates. We used a Gaussian kernel with a window-width γ = N 1 / 6 .7 Standard errors and the bias correction were based on the consistent estimators B ^ ( β ^ ) , Q ^ ( β ^ ) and C ^ ( β ^ ) from Equation (14).
Table 1, Table 2 and Table 3 report the summary statistics of the simulations for the estimates of the coefficients on X 2 , ( s / 100 ) and ( s / 100 ) 2 , respectively. We also conducted corresponding probit estimates as benchmarks. Note that, with normal errors, the probit estimates were fully efficient. The summary statistics indicated that the semiparametrically-estimated coefficients on X 2 were very close to the true parameter. The bias and standard deviation both decreased with sample size. This is particularly true compared to the (misspecified) probit estimator when the errors were heteroskedastic. As for the coefficient on the linear duration dependence term ( s / 100 ), there appeared to be some bias, particularly in the presence of heteroskedasticity. However, the bias and RMSE of the SMS estimators diminished with sample size. This was not the case with the probit estimators. As indicated earlier, estimating duration dependence term at the N -rate was not possible. The estimates of the coefficient on the quadratic term of the duration dependence were somewhat biased, although the bias decreased with the sample as did the RMSE. Larger sample sizes than used here may be required to estimate, with precision, more nuanced forms of duration dependence using the proposed SMS in these contexts.
We also examined the distribution of the estimates. Figure 1, Figure 2 and Figure 3 graph the QQ-plots of the standardized SMS estimates of the coefficients on X 2 , s / 100 and ( s / 100 ) 2 , respectively. Most of the standardized estimates appeared to be close to the standard normal quantiles, except for a few extreme values. The extreme values are potentially due to difficulties with numerical optimization. This would seem to indicate that the sampling distributions of the estimators in our simulation exercise were reasonably well approximated by a normal distribution.

5. Conclusions

This paper has shown that the SMS estimator can be readily adapted to consistently estimate the parameters of a popular class of discrete duration models, while relaxing the distributional assumptions of parametric models and certain semiparametric models. The asymptotic distribution of the estimators was derived and can be readily approximated using standard software. Simulations illustrated the viability of the approach. We are currently working on an empirical application of the estimator.

Author Contributions

Coceptualization, formal analysis, writing, P.R.; software, validation, S.R.

Funding

This research received no external funding.

Acknowledgments

The authors appreciate comments from Richard Blundell, Christian Bontemps, Juan Rodríguez-Poo and seminar participants at the 2016 African Meetings of the Econometric Society in South Africa and the 2017 International Conference on Panel Data in Thessaloníki. The authors are responsible for any errors.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Proof  of Lemma 1 (a).
To derive the expected value of q i s ( 1 ) ( β 0 ) , suppress the i subscripts, and write:
E [ q s ( 1 ) | X s , Y s 1 ] = E [ Y s 1 ( 2 Y s 1 ) K ( 1 ) ( Z s / γ ) X s X s ] / γ 2 = E [ A s | X s , Y s 1 ] X s X s Y s 1
where A s = ( 21 [ Z s + U s 0 ] 1 ) 1 γ 2 K ( 1 ) ( Z s / γ ) , suppressing the X s and Y s 1 arguments in g ( u s | z s , x s , y s 1 ) and h ( z s | x s , y s 1 ) .
E [ A s | Z s ] = ( 21 [ Z s + u s 0 ] 1 ) K ( 1 ) ( Z / γ ) g ( u s | Z s ) d u s / γ 2 = ( Z s + Z s ) ( 21 [ Z s + u s 0 ] 1 ) K ( 1 ) ( Z s / γ ) g ( u s | Z s ) d u s / γ 2 = K ( 1 ) ( Z s / γ ) ( Z s Z s ) g ( u s | Z s ) d u s .
E [ A s ] = K ( 1 ) ( z s / γ ) ( z s z s ) g ( u s | z s ) d u s / γ 2 f ( z s ) d z s = K ( 1 ) ( w ) ( w γ w γ ) g ( u s | w γ ) f ( w γ ) d u s d w / γ = K ( 1 ) ( w ) ( ( 1 G ( w γ | w γ ) G ( w γ | w γ ) ) f ( w γ ) d w / γ = K ( 1 ) ( w ) ( ( 1 2 G ( w γ | w γ ) ) f ( w γ ) d w / γ = K ( w ) ( ( ( 1 2 G ( w γ | w γ ) ) f ( w γ ) ) [ 1 ] d w ( ( ( 1 2 G ( z s | z s ) ) f ( z s ) ) z s = 0 [ 1 ] = 2 G [ 1 ] ( 0 | 0 ) f ( 0 )
To prove Part (b), make substitutions as in (a), with:
E [ q s | X s , Y s 1 ] = E [ A s | X s , Y s 1 ] X s Y s 1
where A s = ( 21 [ Z s + U s 0 ] 1 ) K ( Z s / γ ) / γ .
E [ A s | Z s ] = ( 21 [ Z s + u s 0 ] 1 ) K ( Z s / γ ) g ( u s | Z s ) d u s / γ = K ( Z s / γ ) ( Z s + Z s ) ( 21 [ Z s + u s 0 ] 1 ) g ( u s | Z s ) d u s / γ = K ( Z s / γ ) ( Z s Z s ) g ( u s | Z s ) d u s / γ = K ( Z s / γ ) ( 1 2 G ( Z s | Z s ) ) / γ
so that:
E [ A s ] = K ( z s / γ ) ( 1 2 G ( z s | z s ) ) f ( z s ) d z s / γ = K ( w ) A ¯ ( w γ ) d w
where A ¯ ( γ ) = ( 1 2 G ( w γ | w γ ) ) f ( w γ ) and:
K ( w ) A ¯ ( w γ ) d w = K ( w ) A ¯ ( 0 ) + j = 1 s 1 A ¯ [ j ] ( 0 ) ( w γ ) j j ! + A ¯ [ m ] ( γ ¯ ) ( w γ ) m m ! ) d w .
Note that A ¯ ( 0 ) = 0 and all the middle terms in Equation (A6) are zero from w j K ( w ) d w = 0 , j = 1 , , m 1 . As for the third term, first note that:
K ( w ) ( A ¯ [ m ] ( γ ¯ ) A ¯ [ m ] ( 0 ) ) w m d w A ¯ [ m ] ( γ ) = o ( 1 )
by dominated convergence, uniformly on x s , Y s 1 . There are a few ways to write A ¯ [ m ] ( 0 ) . It is simplest to note first that:
K ( w ) A [ m ] ( 0 ) w j d w = μ m A [ m ] ( 0 )
and by the binomial theorem:
A [ m ] ( 0 ) = j = 1 m m j ( 1 2 G ( u | u ) ) [ m j ] f [ j ] ( z ) | u = 0 = 2 j = 1 m 1 m j G [ m j ] ( z | z ) f [ j ] ( u ) | z = 0 .
To prove Part (c):
γ E [ q s q τ | X s , Y s 1 , X τ , Y τ 1 ] = E [ A s τ | X s , Y s 1 , X τ , Y τ 1 ] X s Y s 1 X τ Y τ 1
where A s τ = ( 21 [ Z s + U s 0 ] 1 ) K ( Z s / γ ) ( 21 [ Z τ + U τ 0 ] 1 ) K ( Z τ / γ ) / γ 2 . From Assumption 1, we have:
E [ A s τ | X s , Y s 1 , X τ , Y τ 1 ] = E [ ( 21 [ Z s + U s 0 ] 1 ) 2 K ( Z s / γ ) 2 ) / γ 2 | X s , Y s 1 ] , s = τ ( E [ ( 21 [ Z s + U s 0 ] 1 ) 2 K ( Z s / γ ) / γ | X s , Y s 1 ] ) 2 = O ( 1 ) , s τ .
It suffices to only consider when s = τ , as it converges at a slower rate than when s τ .
E [ A s τ | Z s ] = K ( Z s / γ ) 2 ( 21 [ Z s + u s 0 ] 1 ) 2 g ( u s | Z s ) d u s / γ 2 = K ( Z s / γ ) 2 ( Z s + Z s ) ( 21 [ Z s + u s 0 ] 1 ) 2 g ( u s | Z s ) d u s / γ 2 = K ( Z s / γ ) 2 ( Z s + Z s ) g ( u s | Z s ) d u s / γ 2
so that:
E [ A s τ ] = K ( z s / γ ) 2 ( z s + z s ) g ( u s | z s ) d u s / γ 2 f ( z s ) d z s / γ 2 = K ( z s / γ ) 2 ( ( 1 G ( u s | s s ) + G ( u s | z s ) ) ) f ( z s ) d z s / γ 2 = K ( w ) 2 f ( w γ ) d w / γ
and γ [ A s τ ] f ( 0 ) K ( w ) 2 d w .  □
Lemma A1.
Assume β ¯ p β 0 . Then, under Assumptions 1–11, ψ N ( 1 ) ( β ¯ ) = Q + o p ( 1 ) .
Proof of Lemma A1.
For ψ N ( 1 ) ( β ¯ ) , note that by the uniform law of large numbers and Slutsky’s theorem, ψ N ( 1 ) ( β ¯ ) lim N E [ q i ( 1 ) ( β 0 ) ] = Q .  □
Proof of Proposition 1.
(a) Consistency is shown by combining and extending the results of Manski (1985) and Horowitz (1992). Following Manski, define a population objective function Ψ * ( β ) = s 1 S ( 2 Pr ( Y s = 1 , Z s ( β ) 0 | Y s 1 ) Pr ( Z s ( β ) 0 | Y s 1 ) ) Pr ( Y s 1 = 1 ) . 8 As per Manski, Ψ * ( β ) is maximized uniquely at β = β 0 , is continuous and Ψ N * ( β ) converges uniformly to Ψ * ( β ) . Extending Horowitz, we have | Ψ N * ( β ) Ψ N ( β ) | p 0 uniformly in β , and hence, β ^ is consistent. (b) To derive the asymptotic distribution, use a Taylor series expansion of the first-order conditions, rearranging them so that:
N γ ( β ^ β 0 ) = ( ψ N ( 1 ) ( β ¯ ) ) 1 N γ ( ψ N ( β 0 ) E ψ N ( β 0 ) )
and from Lemmas 1 and A1:
N γ ( β ^ β 0 γ m Q 1 B ) = ( Q 1 + o P ( 1 ) ) N γ q ˜ i + o P ( N γ γ m ) .
Application of the central limit theorem completes the result.  □

References

  1. Allison, Paul D. 1982. Discrete-Time Methods for the Analysis of Event Histories. In Sociological Methodology 1982. Edited by S. Leinhardt. San Francisco: Jossey-Bass Publishers, pp. 61–98. [Google Scholar]
  2. Baker, Michael, and Samuel A. Rea. 1998. Employment Spells and Unemployment Insurance Eligibility Requirements. Review of Economics and Statistics 80: 80–94. [Google Scholar] [CrossRef]
  3. Bearse, Peter, José Canals-Cerda, and Paul Rilstone. 2007. Efficient Semiparametric Estimation of Duration Models with Unobserved Heterogeneity. Econometric Theory 23: 281–308. [Google Scholar] [CrossRef]
  4. Bover, Olympia, Manuel Arellano, and Samuel Bentolila. 2002. Unemployment Duration, Benefit Duration and the Business Cycle. Economic Journal 112: 223–65. [Google Scholar]
  5. Cameron, Stephen V., and James J. Heckman. 1998. Life cycle schooling and dynamic selection bias: Models and evidence for five cohorts of American males. Journal of Political Economy 106: 262–333. [Google Scholar] [CrossRef]
  6. Chan, Sewin, and Ann Huff-Stevens. 2001. Job Loss and Employment Patters of Older Workers. Journal of Labor Economics 19: 484–521. [Google Scholar] [CrossRef]
  7. Charlier, Erwin, Bertrand Melenberg, and Arthur H. O. van Soest. 1995. A Smoothed Maximum Score estimator for the Binary Choice Data Model with an Application to Labour Force Participation. Statistica Neerlandica 49: 324–42. [Google Scholar] [CrossRef]
  8. Cooper, Russell, John Haltiwanger, and Laura Power. 1999. Machine Replacement and the Business Cycle: Lumps and Bumps. American Economic Review 89: 921–46. [Google Scholar] [CrossRef]
  9. D’Addio, Anna C., and Michael Rosholm. 2005. Exits from Temporary Jobs in Europe: A competing Risks Analysis. Labour Economics 12: 449–68. [Google Scholar] [CrossRef]
  10. De Jong, Robert M., and Tiemen Woutersen. 2011. Dynamic Time Series Binary Choice. Econometric Theory 27: 673–702. [Google Scholar]
  11. Fennema, Julian, Wilko Letterie, and Gerard Pfann. 2006. The Timing of Investment Episodes in the Netherlands. De Economist 154: 373–88. [Google Scholar] [CrossRef]
  12. Finnie, Ross, and David Gray. 2002. Earnings Dynamics in Canada: An Econometric Analysis. Labour Economics 9: 763–800. [Google Scholar] [CrossRef]
  13. Fox, Jeremy T. 2007. Semiparametric estimation of multinomial discrete–choice models using a subset of choices. Rand Journal of Economics 38: 1002–19. [Google Scholar] [CrossRef]
  14. Gullstrand, Joakim, and Kerem Tezic. 2008. Who Leaves After Entering the Primary Sector? Evidence from Swedish Micro-level Data. European Review of Agricultural Economics 35: 1–28. [Google Scholar] [CrossRef]
  15. Hess, Wolfgang. 2009. A Flexible Hazard Rate Model for Grouped Duration Data. Mimeo. Lund: Lund University. [Google Scholar]
  16. Holmas, Tor H. 2002. Keeping Nurses at Work: A Duration Analysis. Health Economics 11: 493–503. [Google Scholar] [CrossRef] [PubMed]
  17. Horowitz, Joel L. 1992. A Smoothed Maximum Score Estimator for the Binary Response Model. Econometrica 60: 505–31. [Google Scholar] [CrossRef]
  18. Horowitz, Joel L. 1998. Semiparametric Methods in Econometrics. New York: Springer. [Google Scholar]
  19. Horowitz, Joel L. 1999. Semiparametric Estimation of a Proportional Hazard Model with Unobserved Heterogeneity. Econometrica 67: 1001–28. [Google Scholar] [CrossRef]
  20. Horowitz, Joel L. 2002. Bootstrap Critical Values for Tests Based on the Smoothed Maximum Score Estimator. Journal of Econometrics 111: 141–67. [Google Scholar] [CrossRef]
  21. Huff-Stevens, Ann. 1999. Climbing out of Poverty, Falling Back in - Measuring the Persistence of Poverty over Multiple Spells. The Journal of Human Resources 34: 534–56. [Google Scholar]
  22. Ichimura, Hidehiko. 1993. Semiparametric least squares (SLS) and weighted SLS estimation of single-index models. Journal of Econometrics 58: 71–120. [Google Scholar] [CrossRef]
  23. Iglesias, Emma M. 2010. First and Second Order Asymptotic Bias Correction of Nonlinear Estimators in a Non-Parametric Setting and an Application to the Smoothed Maximum Score Estimator. Studies in Nonlinear Dynamics and Econometrics 14: 1–30. [Google Scholar] [CrossRef]
  24. Jun, Sung Jae, Joris Pinkse, and Yuanyuan Wang. 2015. Classical Laplace estimation for N1/3-consistent estimators: Improved convergence rates and rate-adaptive inference. Journal of Econometrics 187: 201–16. [Google Scholar] [CrossRef]
  25. Klein, Roger W., and Richard H. Spady. 1993. An Efficient Semiparametric Estimator for Binary Response Models. Econometrica 61: 387–421. [Google Scholar] [CrossRef]
  26. Kotlyarova, Yulia, and Victoria Zinde-Walsh. 2010. Robust estimation in binary choice models. Communications in Statistics–Theory and Methods 39: 266–79. [Google Scholar] [CrossRef]
  27. Lee, Myoung-Jae. 1992. Median regression for ordered discrete response. Journal of Econometrics 51: 59–77. [Google Scholar] [CrossRef]
  28. Li, Qi, and Jeffrey S. Racine. 2007. Nonparametric Econometrics. Princeton: Princeton University Press. [Google Scholar]
  29. Manski, Charles F. 1975. Maximum Score Estimation of the Stochastic Utility Model of Choice. Journal of Econometrics 3: 205–28. [Google Scholar] [CrossRef]
  30. Manski, Charles F. 1985. Semiparametric Analysis of Discrete Response: Asymptotic Properties of the Maximum Score Estimator. Journal of Econometrics 32: 65–108. [Google Scholar]
  31. Meghir, Costas, and Edward Whitehouse. 1997. Labour Market Transitions and Retirement of men in the UK. Journal of Econometrics 79: 327–54. [Google Scholar] [CrossRef]
  32. Melenberg, Bertrand, and Arthur H. O. Van Soest. 1996. Parametric and semi-parametric modelling of vacation expenditures. Journal of Applied Econometrics 11: 59–76. [Google Scholar] [CrossRef]
  33. Nielsen, Jens P., Oliver Linton, and Peter J. Bickel. 1998. On a semiparametric survival model with flexible covariate effect. The Annals of Statistics 26: 215–41. [Google Scholar] [CrossRef]
  34. Reza, Sadat, and Paul Rilstone. 2014. A simple root-N-consistent semiparametric estimator for discrete duration models. Statistics and Probability Letters 95: 150–54. [Google Scholar] [CrossRef]
  35. Reza, Sadat, and Paul Rilstone. 2016. Semiparametric efficiency bounds and efficient estimation of discrete duration models with unspecified hazard rate. Econometric Reviews 35: 693–726. [Google Scholar] [CrossRef]
  36. Rilstone, Paul, Virendra K. Srivastava, and Aman Ullah. 1996. The Second-Order Bias, and Mean Squared Error of Nonlinear Estimators. Journal of Econometrics 75: 369–95. [Google Scholar] [CrossRef]
  37. Sueyoshi, Glenn T. 1995. A Class of Binary Response Models for Grouped Duration Data. Journal of Applied Econometrics 10: 411–31. [Google Scholar] [CrossRef]
  38. Van der Vaart, Aad. 1996. Efficient Maximum Likelihood Estimation in Semiparametric Mixture Models. The Annals of Statistics 24: 862–78. [Google Scholar] [CrossRef]
1
Horowitz (1998) gave a discussion of these issues.
2
Some normalization of the parameter space is necessary. We find it most convenient to impose a unit coefficient on X i s * immediately.
3
The model is easily reformulated to incorporate functions of the Y j ’s, j s as conditioning variables.
4
For example, Cameron and Heckman (1998) defined S as the upper limit to years of education. In practice, for programming purposes, it suffices to set S equal to the longest duration in the dataset being used. In the simulations reported in Section 4, the maximum duration was 37.
5
This has two aspects: one is that it implies that estimates of the other β ’s are all to scale and that we know the sign of the first coefficient.
6
In this case, the random sampling assumption should be interpreted as referring to the stochastic elements of x s .
7
Estimates using a fourth-order kernel as in Horowitz (1992) yielded very similar results. The non-stochastic window-width was used, rather than, say, a plug-in window-width, to keep the simulations manageable.
8
This corresponds to Manski for S = 1 .
Figure 1. QQ plot of estimated coefficient on X 2 .
Figure 1. QQ plot of estimated coefficient on X 2 .
Jrfm 12 00064 g001
Figure 2. QQ plot of estimated coefficient on s/100.
Figure 2. QQ plot of estimated coefficient on s/100.
Jrfm 12 00064 g002
Figure 3. QQ plot of estimated coefficient on ( s 100 ) 2 .
Figure 3. QQ plot of estimated coefficient on ( s 100 ) 2 .
Jrfm 12 00064 g003
Table 1. Simulation summary statistics—parameter: coefficient on X 2 .
Table 1. Simulation summary statistics—parameter: coefficient on X 2 .
No. of ObservationsSpec (1)Spec (2)
Normal ErrorNormal, Heteroscedastic Error
50010005001000
Using second order kernel
True value1.0001.0001.0001.000
Estimates
 Mean1.0130.9821.0341.001
 Standard dev.0.1140.0810.0940.063
 RMSE0.1150.0830.1000.063
 Skewness0.4520.4810.4910.308
 Kurtosis3.1673.3054.2263.652
Using normal cdf as continution probability
True value1.0001.0001.0001.000
Estimates
 Mean1.0171.0030.9370.939
 Standard dev.0.0930.0320.0630.045
 RMSE0.0940.0320.0900.076
 Skewness0.2600.1140.163−0.082
 Kurtosis2.7122.9242.9002.970
Table 2. Simulation summary statistics—parameter: coefficient on ( s / 100 ) .
Table 2. Simulation summary statistics—parameter: coefficient on ( s / 100 ) .
No. of ObservationsSpec (1)Spec (2)
Normal ErrorNormal, Heteroscedastic Error
50010005001000
Using second order kernel
True value2.0002.0002.0002.000
Estimates
 Mean2.3592.1121.7371.790
 Standard dev.3.4262.3561.8541.340
 RMSE3.4412.3561.8711.355
 Skewness0.1260.181−0.280−0.065
 Kurtosis4.2333.9868.1494.617
Using normal cdf as continution probability
True value2.0002.0002.0002.000
Estimates
 Mean2.5772.3431.8131.633
 Standard dev.1.5441.0100.8300.623
 RMSE1.6471.0660.8500.722
 Skewness0.126−0.0420.3040.433
 Kurtosis3.1503.0922.8943.344
Table 3. Simulation summary statistics—parameter: coefficient on ( s / 100 ) 2 .
Table 3. Simulation summary statistics—parameter: coefficient on ( s / 100 ) 2 .
No. of ObservationsSpec (1)Spec (2)
Normal ErrorNormal, Heteroscedastic Error
50010005001000
Using second order kernel
True value−1.000−1.000−1.000−1.000
Estimates
 Mean−2.147−1.5540.6850.042
 Standard dev.14.3029.8056.8043.911
 RMSE14.3349.8107.0034.043
 Skewness0.182−3.8462.4001.283
 Kurtosis7.84443.23721.3388.266
Using normal cdf as continution probability
True value−1.000−1.000−1.000−1.000
Estimates
 Mean−4.032−2.678−1.435−0.922
 Standard dev.6.1133.6091.7981.273
 RMSE6.8193.9771.8481.274
 Skewness−0.929−0.655−1.135−1.576
 Kurtosis4.4674.0515.0059.012

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
J. Risk Financial Manag. EISSN 1911-8074 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top