# A GMM-Based Test for Normal Disturbances of the Heckman Sample Selection Model

^{1}

^{2}

*Keywords:*sample selection model; GMM; normality; pseudo-score LM test

Next Article in Journal

Previous Article in Journal

Department of Economics, University of Innsbruck, Universitaetsstrasse 15, Innsbruck 6020, Austria

Austrian Institute of Economic Research, P.O.-Box 91, Vienna A-1103, Austria

Received: 31 July 2014
/
Revised: 10 October 2014
/
Accepted: 14 October 2014
/
Published: 23 October 2014

The Heckman sample selection model relies on the assumption of normal and homoskedastic disturbances. However, before considering more general, alternative semiparametric models that do not need the normality assumption, it seems useful to test this assumption. Following Meijer and Wansbeek (2007), the present contribution derives a GMM-based pseudo-score LM test on whether the third and fourth moments of the disturbances of the outcome equation of the Heckman model conform to those implied by the truncated normal distribution. The test is easy to calculate and in Monte Carlo simulations it shows good performance for sample sizes of 1000 or larger.

The assumption of bivariate normal and homoskedastic disturbances is a prerequisite for the consistency of the maximum likelihood estimator of the Heckman sample selection model. Moreover, some studies focus on the prediction of counterfactuals based on the Heckman sample selection model taking into account both changes in participation and outcome, which is often only feasible under the assumption of bivariate normality.1 Lastly, under the assumption of bivariate normality one can the estimate the Heckman sample selection model by maximum likelihood methods that are less sensitive to weak exclusion restrictions.

Before employing alternative semiparametric estimators that do not need the normality assumption (see e.g., Newey, 2009 [3]), it seems useful to test the underlying normality assumption of sample selection models. So far, the literature offers several approaches to test this hypothesis.2 Bera et. al., (1984) [6] develop an LM test for normality of the disturbances in the general Pearson framework, which implies testing the moments up to order four. Lee (1984) [7] proposes Lagrangian multiplier tests within the bivariate Edgeworth series of distributions. Van der Klaauw and Koning (1993) [8] derive LR tests in a similar setting, while Montes-Rojas (2011) [9] proposes LM and $C\left(\alpha \right)$ tests that are likewise based on bivariate Edgeworth series expansions, but robust to local misspecification in nuisance distributional parameters. In general, these approaches tend to lead to complicated test statistics that are sometimes difficult to implement in standard econometric software. More importantly, some of these tests for bivariate normality seem to exhibit unsatisfactory performance in Monte Carlo simulations and are rejected too often in small to medium samples sizes, especially if the parameter of the Mills’ ratio is high in absolute value (see e.g., Montes-Rojas, 2011 [9], Table 1). This motivates Montes-Rojas (2011) [9] to focus on the assumptions of the two-step estimator that requires less restrictive assumptions, namely a normal marginal distribution of the disturbances of the selection equation and a linear conditional expectation of the disturbances of the outcome equation. He proposes to test for marginal normality and linearity of the conditional expectation of outcome model separately and shows that the corresponding locally size-robust test statistics based on the two-step estimator perform well in terms size and power .

In a possibly neglected, but very valuable paper, Meijer and Wansbeek (2007) [10] embed the two-step estimator of the Heckman sample selection model in a GMM-framework. In addition, they argue that within this framework it is easily possible to add moment conditions for designing Wald tests in order to check the assumption of bivariate normality and homoskedasticity of the disturbances. Their approach does not attempt to develop a most powerful test, rather they intended to design a relatively simple test for normality that can be used as an alternative to the existing tests. The test can be interpreted as a conditional moment test and checks whether the third and fourth moments of the disturbances of the outcome equation of the Heckman model conform to those implied by the truncated normal distribution. For ${H}_{0}$ to hold, the test in addition requires normally distributed disturbances of the selection equation and the absence of heteroskedasticity in both the outcome and the selection equation.

Meijer and Wansbeek (2007) [10] do not explicitly derive the corresponding test statistic nor do they provide Monte Carlo simulations on its performances in finite samples. The present contribution takes up their approach arguing that a GMM based pseudo-score LM test is well suited to test the hypothesis of bivariate normality and is easy to calculate. The derived LM test is similar to the widely used Jarque and Bera LM test (1980) [11], and in the absence of sample selection reverts to their LM test statistic. Monte Carlo simulations show good performance of the proposed test for samples of sizes of 1000 or larger, especially if a powerful exclusion restriction is available.

In a cross-section of n units the Heckman (1979) [12] sample selection model is given as
where ${y}_{1i}^{*}$ and ${y}_{2i}^{*}$ denote latent random variables. The outcome variable, ${y}_{2i}^{*},$ is observed if the latent variable ${y}_{1i}^{*}>0$ or, equivalently, if ${d}_{i}=1$. ${z}_{i}$ is a ${k}_{1}\times 1$ vector containing the exogenous variables of the selection equation and ${x}_{i}$ is the ${k}_{2}\times 1$ vector of the exogenous variables of the outcome equation. ${z}_{i}$ may include the variables in ${x}_{i},$ but also additional ones so that an exclusion restriction holds. γ and β denote the corresponding parameter vectors. Under ${H}_{0}$ the disturbances are assumed to be distributed as bivariate normal, i.e.,

$$\begin{array}{ccc}\hfill {y}_{1i}^{*}& =& {z}_{i}^{\prime}\gamma +{u}_{1i}\hfill \\ \hfill {y}_{2i}^{*}& =& {x}_{i}^{\prime}\beta +{u}_{2i}\hfill \\ \hfill {d}_{i}& =& \left\{\begin{array}{cc}1& \text{if}\phantom{\rule{4.pt}{0ex}}{y}_{1i}^{*}>0\\ 0& \text{otherwise}\end{array}\right.\hfill \\ \hfill {y}_{2i}& =& \left\{\begin{array}{cc}{y}_{2i}^{*}\hfill & \text{if}\phantom{\rule{4.pt}{0ex}}{d}_{i}=1\\ \text{unobserved}\hfill & \text{if}\phantom{\rule{4.pt}{0ex}}{d}_{i}=0\end{array}\right.\phantom{\rule{4pt}{0ex}}\hfill \end{array}$$

$$\left(\left[\begin{array}{c}{u}_{1i}\\ {u}_{2i}\end{array}\right]|{x}_{i},{z}_{i}\right)\sim N\left(\left[\begin{array}{c}0\\ 0\end{array}\right],\left[\begin{array}{cc}1& \tau \\ \tau & {\sigma}^{2}\end{array}\right]\right)$$

It is easy to show that under these assumptions
where ${\lambda}_{i}$ denotes the inverse Mills’ ratio. Under the normal assumption one can specify ${u}_{2i}=\tau {u}_{1i}+{\epsilon}_{i}$ so that ${\epsilon}_{i}\sim iidN(0,{\sigma}^{2}-{\tau}^{2})$. ${\epsilon}_{i}$ is independent of ${u}_{1i}$ as $E\left[{\epsilon}_{i}{u}_{1i}\right]=E\left[\left({u}_{2i}-\tau {u}_{1i}\right){u}_{1i}\right]=\tau -\tau =0$. Since $E\left[{u}_{1i}\right|{d}_{i}=1]={\lambda}_{i}$, it holds that $E[\tau ({u}_{1i}-{\lambda}_{i})+{\epsilon}_{i}|{d}_{i}=1]=0$. Therefore, the two-step Heckman sample selection model includes the estimated inverse Mills’ ratio in the outcome equation as an additional regressor. For the observed outcome at ${d}_{i}=1$ the model can be written as
where ${v}_{i}={u}_{1i}-{\lambda}_{i}$ and $E\left[{y}_{2i}^{*}\right|{d}_{i}=1]={x}_{i}^{\prime}\beta +\tau {\lambda}_{i}.$

$$\begin{array}{ccc}\hfill {p}_{i}& \equiv & E\left[{d}_{i}\right]=\Phi \left({z}_{i}^{\prime}\gamma \right)\hfill \\ \hfill {\lambda}_{i}& \equiv & E\left[{u}_{1i}\right|{u}_{1i}\ge -{z}_{i}^{\prime}\gamma ]={\textstyle \frac{\varphi (-{z}_{i}^{\prime}\gamma )}{1-\Phi (-{z}_{i}^{\prime}\gamma )}}={\textstyle \frac{\varphi \left({z}_{i}^{\prime}\gamma \right)}{\Phi \left({z}_{i}^{\prime}\gamma \right)}}\hfill \end{array}$$

$$\begin{array}{ccc}\hfill {y}_{2i}^{*}& =& {x}_{i}^{\prime}\beta +\tau {\lambda}_{i}+\tau \left({u}_{1i}-{\lambda}_{i}\right)+{\epsilon}_{i}\hfill \\ & \equiv & {w}_{i}^{\prime}\alpha +\tau {v}_{i}+{\epsilon}_{i}\hfill \end{array}$$

Meijer and Wansbeek (2007) [10] embed the two-step Heckman sample selection estimator in a GMM framework and demonstrate that the estimation can be based on
where ${\theta}_{1}={({\gamma}^{\prime},{\beta}^{\prime},\tau ,\sigma )}^{\prime}$ and ${\phi}_{k,i}=E\left[{\left(\tau {v}_{i}+{\epsilon}_{i}\right)}^{k}|{d}_{i}=1\right],k=2,3,4.$ Note, there are as many parameters as moment conditions and the model is just-identified.

$$\begin{array}{ccc}\hfill {h}_{1,1i({k}_{1}\times 1)}\left({\theta}_{1}\right)& \equiv & {\textstyle \frac{({d}_{i}-{p}_{i}){\varphi}_{i}}{{p}_{i}(1-{p}_{i})}}{z}_{i}\hfill \\ \hfill {h}_{1,2i({k}_{2}\times 1)}\left({\theta}_{1}\right)& \equiv & {d}_{i}{w}_{i}({y}_{i}-{w}_{i}^{\prime}\alpha )={d}_{i}{w}_{i}(\tau {v}_{i}+{\epsilon}_{i})\hfill \\ \hfill {h}_{1,3i(1\times 1)}\left({\theta}_{1}\right)& \equiv & {d}_{i}\left[{({y}_{i}-{w}_{i}^{\prime}\alpha )}^{2}-{\phi}_{2,i}\right]={d}_{i}\left[{({\epsilon}_{i}+\tau {v}_{i})}^{2}-{\phi}_{2,i}\right]\hfill \end{array}$$

The first set of moment equations is based on ${\overline{h}}_{1,1}\left({\theta}_{1}\right)=\frac{1}{n}{\sum}_{i=1}^{n}{h}_{1,1i}\left({\theta}_{1}\right)$ and refers to the score of the Probit model. Since these moment conditions do not include the parameters entering ${h}_{1,2i}$ and ${h}_{1,3i}$ (i.e., ${\beta}^{\prime},\tau ,\sigma $) and are exactly identified, estimation can proceed in steps: In the first step, one can solve $\frac{1}{n}{\sum}_{i=1}^{n}{h}_{1,1i}\left(\widehat{\gamma}\right)=0$ and in the second step one solves the sample moment condition ${\overline{h}}_{1,2}\left({\theta}_{1}\right)=\frac{1}{n}{\sum}_{i=1}^{n}\left[{h}_{1,2i}({\widehat{\gamma}}^{\prime},{\widehat{\beta}}^{\prime},\widehat{\tau})\right]=0$ using the estimated $\widehat{\gamma}$ derived in the first stage. This leads to the two-step Heckman estimator, which first estimates a Probit model, inserts the estimated Mills’ ratio ${\widehat{\lambda}}_{i}$ as additional regressor in the outcome equation and applies OLS. Lastly, from ${\overline{h}}_{1,3}\left({\theta}_{1}\right)=\frac{1}{n}\sum {h}_{1,3i}({\widehat{\gamma}}^{\prime},{\widehat{\beta}}^{\prime},\widehat{\tau},\widehat{\sigma})=0$ one can obtain an estimator of ${\sigma}^{2}.$

As Meijer and Wansbeek (2007) [10] remark, a rough and simple test for normality can be based on two additional moment conditions that allow comparing the third and fourth moments of the estimated residuals of the outcome equation, ${y}_{i}-{w}_{i}^{\prime}\widehat{\alpha}$, with their theoretical counterparts based on the truncated normal distribution. These moment conditions use
Thereby, ${\theta}_{2}=(\xi ,\kappa )$ denotes additional parameters that are zero under normality. More importantly, under ${H}_{0}$ the expectations ${\phi}_{k,i}$ can be derived recursively from the moments of the truncated normal distribution as shown in the Appendix (see alsoMeijer and Wansbeek, 2007, pp. 45–46) [10]. In general, these moments depend on the parameters ${\theta}_{1}$ and, especially, on the inverse Mills’ ratio ${\lambda}_{i}$ and the parameter τ.

$$\begin{array}{ccc}\hfill {h}_{2,1i(1\times 1)}({\theta}_{1},{\theta}_{2})& \equiv & {d}_{i}\left[{({y}_{i}-{w}_{i}^{\prime}\alpha )}^{3}-{\phi}_{3,i}-\xi \right]={d}_{i}\left[{(\tau {v}_{i}+{\epsilon}_{i})}^{3}-{\phi}_{3,i}-\xi \right]\hfill \\ \hfill {h}_{2,2i(1\times 1)}({\theta}_{1},{\theta}_{2})& \equiv & {d}_{i}\left[{({y}_{i}-{w}_{i}^{\prime}\alpha )}^{4}-{\phi}_{4,i}-\kappa \right]={d}_{i}\left[{(\tau {v}_{i}+{\epsilon}_{i})}^{4}-{\phi}_{4,i}-\kappa \right]\hfill \end{array}$$

To detect violations of the normality assumption, one can test ${H}_{0}:\xi =0$ and $\kappa =0$ vs. ${H}_{0}:\xi \ne 0$ and/or $\kappa \ne 0.$ Although this hypothesis checks the third and fourth moments of the disturbances of the two-step outcome Equation (1), it can only be true if ${\phi}_{3,i}$ and ${\phi}_{4,i}$ are the correct expected values. Therefore, the test additionally requires the moment conditions $E\left[{h}_{1,1i}\right]=0$ and $E[{h}_{1,2i}=0]$ to hold so that the parameters of both the selection equation and the outcome equation are consistently estimated. The present hypothesis is somewhat more restrictive than that tested, e.g., in Montes-Rojas (2011) [9], who emphasizes that the Heckman two-step estimator is robust to distributional misspecification if (i) the marginal distribution of ${u}_{1i}$ is normal and (ii) $E\left[{u}_{2i}\right|{u}_{1i}]=\tau {u}_{1i}$, i.e., the conditional expectation is linear.3

In addition, ${H}_{0}$ also requires the absence of heteroskedasticity ( see Meijer and Wansbeek, 2007, p. 46 ) [10]. To give an example, assume that ${u}_{2i}=\tau {u}_{1i}+{\epsilon}_{i}$ and ${u}_{2i}$ are bivariate normal, but the variances of ${\epsilon}_{i}$ differ across i and are given as ${\sigma}_{i}^{2}-{\tau}^{2}$ (see also the DGP6 in the Monte Carlo set-up below and the excess kurtosis of DGP6 in Table 1 below). Then, it follows that ${\phi}_{k,i}={\sum}_{j=0}^{k}\left(\genfrac{}{}{0pt}{}{k}{j}\right)E\left[{\epsilon}_{i}^{k-j}\right]{\tau}^{j}{\psi}_{j,i},$ where ${\psi}_{k,i}\equiv E[\left({v}_{i}^{k}\right|{d}_{i}=1]={\sum}_{j=0}^{k}{(-1)}^{j}\left(\genfrac{}{}{0pt}{}{k}{j}\right){\mu}_{j,i}{\lambda}_{i}^{k-j}$ and ${\mu}_{j,i}=E\left[{u}_{1i}^{k}\right|{u}_{1i}>-{z}_{i}^{\prime}\gamma ]$ (see the Appendix). In this case, we have $E\left[{\epsilon}_{i}^{2}\right]={\sigma}_{i}^{2}-{\tau}^{2}$ and $E\left[{\epsilon}_{i}^{4}\right]=3{\left({\sigma}_{i}^{2}-{\tau}^{2}\right)}^{2}$, while the corresponding uneven moments are zero. Hence, under heteroskedasticity ${\phi}_{4,i}$ differs from that obtained under ${H}_{0}$ which assumes $E\left[{\epsilon}_{i}^{2}\right]={\sigma}^{2}-{\tau}^{2}$ and $E\left[{\epsilon}_{i}^{4}\right]=3{({\sigma}^{2}-{\tau}^{2})}^{2}$ and the population moment condition $E\left[{h}_{2,2i}({\theta}_{1},{\theta}_{2})\right]=0$ is violated. Hence, a test based on these moments should also be able to detect heteroskedasticity, although not in the most efficient way.

Applying a pseudo-score LM test (Newey and West, 1987 [13]; Hall, 2005 [14]), in this GMM-framework leads to a ${\chi}^{2}\left(2\right)$-test statistic that can be calculated easily. In order to derive the LM test statistic, define $\overline{h}\left(\theta \right)=\frac{1}{n}{\sum}_{i=1}^{n}{h}_{i}\left(\theta \right)$ and $\overline{\Psi}\left(\theta \right)={\textstyle \frac{1}{n}}{\sum}_{i=1}^{n}{h}_{i}\left(\theta \right){h}_{i}{\left(\theta \right)}^{\prime}$, where ${h}_{i}\left(\theta \right)\equiv {({h}_{1,1i}^{\prime},{h}_{1,2i}^{\prime},{h}_{1,3i},{h}_{2,1i},{h}_{2,2i})}^{\prime}$. It is assumed that ${\Psi}_{0}$ = $pli{m}_{n\to \infty}\overline{\Psi}\left({\theta}_{0}\right)$ exists, is positive definite and invertible. Under standard assumptions, it holds that
where the subscript 0 indicates that ${H}_{0}$ is assumed. Thereby, ${A}_{0}={G}_{0}^{-1}{\Psi}_{0}{\left({G}_{0}^{-1}\right)}^{\prime}$ and ${G}_{0}$ is the probability limit of $\overline{G}\left({\theta}_{0}\right)=\frac{1}{n}{\sum}_{i=1}^{n}{\left.\frac{\partial h\left(\theta \right)}{\partial \theta}\right|}_{\theta ={\theta}_{0}}$. Note, $\overline{G}\left({\theta}_{0}\right)$ is invertible as the model is just-identified.

$$\begin{array}{c}{n}^{1/2}\overline{h}\left({\theta}_{0}\right)\stackrel{d}{\to}N(0,{\Psi}_{0})\hfill \\ {n}^{1/2}(\widehat{\theta}-{\theta}_{0})\stackrel{d}{\to}N(0,{A}_{0})\hfill \end{array}$$

Under ${H}_{0}$ the moment conditions $E[{h}_{2,i}({\theta}_{1},{\theta}_{2}),\xi ,\kappa ],{h}_{2,i}({\theta}_{1},{\theta}_{2})\equiv {({h}_{2,1i},{h}_{2,2i})}^{\prime}$ referring to the third and fourth moments of the outcome equation are zero at $\xi =0$ and $\kappa =0$ and the separability result in Ahn and Schmidt (1995, Section 4) [15] can be applied. Denoting the restricted estimates under ${H}_{0}$ by a tilde, using the invertibility of $\overline{G}\left(\tilde{\theta}\right)$ and the partitioned inverse of ${\Psi}_{n}\left(\tilde{\theta}\right)=E\left[\overline{\Psi}\left(\tilde{\theta}\right)\right]$, the pseudo-score LM test statistic can be derived as (see the Appendix for details):

$$LM\left(\tilde{\theta}\right)=n{\overline{h}}_{2}^{\prime}\left(\tilde{\theta}\right){\left({\Psi}_{n,22}\left(\tilde{\theta}\right)-{\Psi}_{n,21}\left(\tilde{\theta}\right){\Psi}_{n,11}{\left(\tilde{\theta}\right)}^{-1}{\Psi}_{n,12}\left(\tilde{\theta}\right)\right)}^{-1}{\overline{h}}_{2}\left(\tilde{\theta}\right)$$

Thereby, ${\overline{h}}_{2}\left(\theta \right)=\frac{1}{n}{\sum}_{i=1}^{n}{h}_{2,i}\left(\theta \right)$ and we use ${\overline{h}}_{1}^{\prime}\left(\tilde{\theta}\right)=\frac{1}{n}{\sum}_{i=1}^{n}{h}_{1,i}\left(\tilde{\theta}\right)=0,$ where ${h}_{1,i}\left(\theta \right)={({h}_{1,1i}^{\prime},{h}_{1,2i}^{\prime},{h}_{1,3i})}^{\prime}$, as well as the partitioned inverse (see the Appendix)
where $V=diag\left(\frac{{\varphi}_{1}^{2}}{{p}_{1}(1-{p}_{1})},..,\frac{{\varphi}_{n}^{2}}{{p}_{n}(1-{p}_{n})}\right),{p}_{i}=P({d}_{i}=1),{Z}_{n\times {k}_{1}}={({z}_{1},...,{z}_{n})}^{\prime},{W}_{n\times {k}_{2}}={({w}_{1},...,{w}_{n})}^{\prime},$ and $\Sigma =diag({\phi}_{2,1},...,{\phi}_{2,n})$. ${\Sigma}_{1}$ is obtained from Σ by deleting all rows and columns referring to ${d}_{i}=0$, and similarly ${W}_{1}.$ ${\Psi}_{n}\left(\theta \right)$ can be consistently estimated by plugging in $\tilde{\theta}$. In addition, Meijer and Wansbeek (2007) [10] show that one can substitute ${d}_{i}$ for ${p}_{i}$ so that only information on the observed units is necessary. Note however, the summation runs over all observations (zero and ones in ${d}_{i}$).

$$\begin{array}{ccc}\hfill {\Psi}_{n,11}\left(\theta \right)& =& \frac{1}{n}\left[\begin{array}{ccc}{Z}^{\prime}VZ& 0& 0\\ *& {W}_{1}^{\prime}{\Sigma}_{1}{W}_{1}& \sum _{{d}_{i}=1}{w}_{i}{\phi}_{3,i}\\ *& *& \sum _{{d}_{i}=1}\left({\phi}_{4,i}-{\phi}_{2,i}^{2}\right)\end{array}\right]\hfill \\ \hfill {\Psi}_{n,22}\left(\theta \right)& =& \frac{1}{n}\sum _{i=1}\left[\begin{array}{cc}{p}_{i}\left({\phi}_{6,i}-{\phi}_{3,i}^{2}\right)& {p}_{i}\left({\phi}_{7,i}-{\phi}_{3,i}{\phi}_{4,i}\right)\\ {p}_{i}\left({\phi}_{7,i}-{\phi}_{3,i}{\phi}_{4,i}\right)& {p}_{i}\left({\phi}_{8,i}-{\phi}_{4,i}^{2}\right)\end{array}\right]\hfill \\ \hfill {\Psi}_{n,12}\left(\theta \right)& =& \frac{1}{n}\sum _{i=1}^{n}\left[\begin{array}{cc}0& 0\\ {p}_{i}{w}_{i}{\phi}_{4,i}& {p}_{i}{w}_{i}{\phi}_{5,i}\\ {p}_{i}\left({\phi}_{5,i}-{\phi}_{2,i}{\phi}_{3,i}\right)& {p}_{i}\left({\phi}_{6,i}-{\phi}_{4,i}{\phi}_{2,i}\right)\end{array}\right]\hfill \end{array}$$

Under standard assumptions it follows that under ${H}_{0}$ we have $LM\left(\tilde{\theta}\right)\stackrel{d}{\to}{\chi}^{2}\left(2\right)$ (see Newey and West, 1987, pp. 781–782 [13] and Theorems 5.6 and 5.7 in Hall, 2005 [14]) . In the absence of sample selection ($\tau =0$) it holds that ${\phi}_{3,i}={\phi}_{5,i}=0$, while ${\phi}_{2,i}={\sigma}^{2}$ and ${\phi}_{4,i}=3{\sigma}^{4}$ and the LM test statistic reverts to that of Jarque and Bera (1980) [11].

Monte Carlo simulations may shed light on the performance of the proposed LM test in finite samples. It is based on a design that has been used previously by van der Klaauw and Koning (1993) [8] and Montes-Rojas (2011) [9], but includes a few modifications. The simulated model is specified as
where for $\rho \in \{-0.8,-0.4,0.4,0.8\}$ and ${\sigma}^{2}\in \{0.25,1\}.$ The explanatory variables ${x}_{1i},{x}_{2i}$, and ${z}_{1i}$ are generated as $iidN(0,3),$ $N(0,3)$ and $U(-3,3),$ respectively. With respect to the disturbances, ${u}_{1i}$, and ${u}_{2i}$ the following data generating processes are considered. Note DGP1-DGP3 imply $Var\left[{u}_{1i}\right]=1$ and $Var\left[{u}_{2i}\right]=0.25.$ In contrast, van der Klaauw and Koning (1993) [8] and Montes-Rojas (2011) [9] consider the case with $Var\left[{u}_{2i}\right]=5$ and thus receive less precise estimates of the slope parameters of the outcome equation.

$$\begin{array}{ccc}\hfill {y}_{1i}^{*}& =& -1{z}_{1i}+1{x}_{2i}-1+{u}_{1i}\hfill \\ \hfill {y}_{2i}^{*}& =& 0.5{x}_{1i}-0.5{x}_{2i}+1+{u}_{2i}\hfill \end{array}$$

- DGP1:$({u}_{1i},{u}_{2i})\sim iidN\left(0,\left[\begin{array}{cc}1& 0.5\rho \\ 0.5\rho & 0.25\end{array}\right]\right)$
- DGP2:${\epsilon}_{1i}\sim t\left(10\right),{\epsilon}_{2i}\sim t\left(10\right),{\epsilon}_{1i}$ and ${\epsilon}_{2i}$ being independent.${u}_{1i}={\epsilon}_{1i}{\left(\frac{10}{8}\right)}^{-1/2}$${u}_{2i}=\sigma {(1+{\rho}^{2})}^{1/2}{\left(\frac{10}{8}\right)}^{-1/2}{\epsilon}_{2i}+\rho \sigma {u}_{1i}$The degrees of freedom are set to 10 to guarantee that the moments up to order 4 exists.
- DGP3:${\epsilon}_{1i}\sim {\chi}^{2}\left(20\right),{\epsilon}_{2i}\sim {\chi}^{2}\left(30\right)$${u}_{1i}=\left({\epsilon}_{1i}-20\right)/\sqrt{40}-20$${u}_{2i}=\sigma {(1+{\rho}^{2})}^{1/2}\left({\epsilon}_{2i}-30\right)/\sqrt{60}+\rho \sigma {u}_{1i}$
- DGP4:${\epsilon}_{1i}\sim N(0,1),{\epsilon}_{2i}\sim {\chi}^{2}\left(30\right)$ and are independent.${u}_{1i}={\epsilon}_{1i}$${u}_{2i}=\sigma {(1+{\rho}^{2})}^{1/2}\left({\epsilon}_{2i}-30\right)/\sqrt{60}+\rho \sigma {u}_{1i}$
- DGP5:${\epsilon}_{1i}\sim {\chi}^{2}\left(20\right),{\epsilon}_{3i}\sim N(0,1)$ and are independent.${u}_{1i}=\left({\epsilon}_{1i}-20\right)/\sqrt{40}-1$${u}_{2i}=\sigma {(1+{\rho}^{2})}^{1/2}{\epsilon}_{2i}+\rho \sigma {u}_{1i}$
- DGP6:${\epsilon}_{1i}\sim N(0,1),{\epsilon}_{2i}\sim N(0,0.25),{\epsilon}_{1i}$ and ${\epsilon}_{2i}$ being independent.${c}_{i}=1+{e}^{\frac{{x}_{1i}}{\sqrt{3}}}\left({e}^{-\frac{1}{2}}-1\right){\left({e}^{1}-1\right)}^{-\frac{1}{2}}$${u}_{1i}={\epsilon}_{1i}$${u}_{2i}=\sigma {(1+{\rho}^{2})}^{1/2}\sqrt{{c}_{i}}{\epsilon}_{2i}+2\rho {u}_{1i}$
- DGP7:${\epsilon}_{1i}\sim N(0,1),{\epsilon}_{2i}\sim N(0,1),{\epsilon}_{1i}$ and ${\epsilon}_{2i}$ being independent.${c}_{i}=1+{e}^{\frac{{x}_{2i}}{\sqrt{3}}}\left({e}^{-\frac{1}{2}}-1\right){\left({e}^{1}-1\right)}^{-\frac{1}{2}}$${u}_{1i}={\epsilon}_{1i}\sqrt{{c}_{i}}$${u}_{2i}=\sigma {(1+{\rho}^{2})}^{1/2}{\epsilon}_{2i}+\sigma \rho {u}_{1i}$

DGP1 serves as a reference to assess the size of the pseudo-score LM test. The second DGP deviates from the bivariate normal in terms of a higher kurtosis, while DGP3 exhibits both higher skewness and kurtosis than the normal. DGP4 allows for deviation from normality in the outcome equation, while keeping the normality assumption in the selection equation. DGP5 reverses this pattern. The disturbances of the outcome equation are normal and those of the selection equation are not. DGP6 and DGP7 introduce heteroskedasticity in either the outcome or the selection equation, respectively. In case of the latter two, the variances of ${u}_{i1}$ and ${u}_{i2}$ is normalized to an average of 1 and $0.25$, respectively. Note, the explanatory variables are held fixed in repeated samples.

Overall, for these DGPs four experiments are considered. In the baseline Experiment 1 (first row of the figures of graphs) $37\%$ of the data remain unobserved and in the absence of sample selection the implied ${R}^{2}$ amounts to $1-\frac{0.25}{1.75}$ = $0.86$ using $Var\left({u}_{2i}\right)=0.25$ and $Var\left({y}_{2i}^{*}\right)=1.75$. Experiment 2 (second row of the figures of graphs) analyzed the performance of the Heckman two-step estimator under a weaker exclusion restriction, assuming ${z}_{1i}\sim iidU(-1,1)$ so that $Var\left({z}_{1i}\right)=1/3$: Experiment 3 (third row of the figures of graphs) sets the constant of the outcome equation to zero so that $49\%$ instead of $37\%$ units are unobserved. Lastly, Experiments 4 (fourth row of the figures of graphs) considers a weaker fit in the outcome equation setting $Var\left({u}_{2i}\right)=1$ so that in the absence of sample selection we have ${R}^{2}=0.43.$

Table 1 summarizes the average variance, skewness and kurtosis of the generated disturbances ${u}_{1i}$ and ${u}_{2i}$ under Experiment 1. In DGP2-DGP7, depending on ρ, the average kurtosis of ${u}_{2i}$ varies between $3.00$ and $5.68$, while the kurtosis of ${u}_{1i}$ lies between $3.07$ and $3.58$ in DGP5. In the other ones the kurtosis of ${u}_{1i}$ is held constant taking values $2.99$ (DGPs 1,4 and 6), $3.97$ (DGP2), $3.58$ (DGP3) and $5.71$ (DGP7), respectively. The skewness coefficient of the generated disturbances is zero for all DGPs except for DGP3 with corresponding values of 0.63 (${u}_{1i}$) and $-0.21$ to 0.43 (${u}_{2i}$) and DGP5 where the skewness of ${u}_{1i}$ varies between $0.14$ and $0.63$.

DGP | ρ | ${u}_{1}$ | ${u}_{2}$ | |||||
---|---|---|---|---|---|---|---|---|

Variance | Skewness | Kurtosis | Variance | Skewness | Kurtosis | |||

1 | all | 1.00 | 0.00 | 2.99 | 0.25 | 0.00 | 2.99 | |

2 | −0.8 | 1.00 | 0.00 | 3.97 | 0.25 | 0.00 | 3.52 | |

2 | −0.4 | 1.00 | 0.00 | 3.97 | 0.25 | 0.00 | 3.70 | |

2 | 0.0 | 1.00 | 0.00 | 3.97 | 0.25 | 0.00 | 3.96 | |

2 | 0.4 | 1.00 | 0.00 | 3.97 | 0.25 | 0.00 | 3.70 | |

2 | 0.8 | 1.00 | 0.00 | 3.97 | 0.25 | 0.00 | 3.52 | |

3 | −0.8 | 1.00 | 0.63 | 3.58 | 0.25 | −0.21 | 3.29 | |

3 | −0.4 | 1.00 | 0.63 | 3.58 | 0.25 | 0.35 | 3.28 | |

3 | 0.0 | 1.00 | 0.63 | 3.58 | 0.25 | 0.51 | 3.38 | |

3 | 0.4 | 1.00 | 0.63 | 3.58 | 0.25 | 0.43 | 3.28 | |

3 | 0.8 | 1.00 | 0.63 | 3.58 | 0.25 | 0.43 | 3.29 | |

4 | −0.8 | 1.00 | 0.00 | 2.99 | 0.25 | 0.11 | 3.05 | |

4 | −0.4 | 1.00 | 0.00 | 2.99 | 0.25 | 0.39 | 3.27 | |

4 | 0.0 | 1.00 | 0.00 | 2.99 | 0.25 | 0.51 | 3.38 | |

4 | 0.4 | 1.00 | 0.00 | 2.99 | 0.25 | 0.39 | 3.27 | |

4 | 0.8 | 1.00 | 0.00 | 2.99 | 0.25 | 0.11 | 3.04 | |

5 | −0.8 | 1.00 | 0.63 | 3.58 | 0.25 | 0.00 | 2.99 | |

5 | −0.4 | 1.00 | 0.63 | 3.58 | 0.25 | 0.00 | 2.99 | |

5 | 0.0 | 1.00 | 0.63 | 3.58 | 0.25 | 0.00 | 2.99 | |

5 | 0.4 | 1.00 | 0.63 | 3.58 | 0.25 | 0.00 | 2.99 | |

5 | 0.8 | 1.00 | 0.63 | 3.58 | 0.25 | 0.00 | 2.99 | |

6 | −0.8 | 1.00 | 0.00 | 2.99 | 0.25 | 0.00 | 3.34 | |

6 | −0.4 | 1.00 | 0.00 | 2.99 | 0.25 | 0.00 | 4.89 | |

6 | 0.0 | 1.00 | 0.00 | 2.99 | 0.25 | 0.00 | 5.68 | |

6 | 0.4 | 1.00 | 0.00 | 2.99 | 0.25 | 0.00 | 4.89 | |

6 | 0.8 | 1.00 | 0.00 | 2.99 | 0.25 | 0.00 | 3.35 | |

7 | −0.8 | 0.99 | 0.00 | 5.71 | 0.25 | 0.00 | 4.11 | |

7 | −0.4 | 0.99 | 0.00 | 5.71 | 0.25 | 0.00 | 3.06 | |

7 | 0.0 | 0.99 | 0.00 | 5.71 | 0.25 | 0.00 | 2.99 | |

7 | 0.4 | 0.99 | 0.00 | 5.71 | 0.25 | 0.00 | 3.06 | |

7 | 0.8 | 0.99 | 0.00 | 5.71 | 0.25 | 0.00 | 4.11 |

Following Davidson and MacKinnon (1998) [16] the size and power is analyzed in terms of size-discrepancy and power-size curves. The former is based on the empirical cumulative distribution function of the p-values, ${p}_{r}$, defined as $F\left(q\right)=\frac{1}{R}{\sum}_{r=1}^{R}I({p}_{r}\le q)$, where R is the number of Monte Carlo replications. The size-discrepancy curves are defined as plots of $F\left(q\right)-q$ against q under the assumption that ${H}_{0}$ holds and DGP1 is the correct one. In addition, one can use a Kolmogorov and Smirnov test to see whether $F\left(q\right)-q$ differs significantly from 0 (see Davidson and MacKinnon 1998, p. 11) [16]. The size-power curves plot power against size, i.e., ${F}_{{H}_{1}}\left(q\right)$ against ${F}_{{H}_{0}}\left(q\right).$ In both plots $q\in \left[0,0.15\right]$ and step size is $0.001.$ An important feature of this procedure is that it avoids size adjustments of the power curves if the tests reject too often under ${H}_{0}$.

Figure 1 exhibits the size-discrepancy plots for Experiments 1–4 and sample sizes $n=500$, $1000,2000$. The plots show that the pseudo-score LM test is properly sized for $\rho =-0.4$ and $\rho =0.4$ in all experiments, while it slightly over-rejects at $\rho =-0.8$ and $\rho =0.8$, especially at a small sample size ($n=500$). For example, at a nominal test size of $0.05$ and a sample size of 1000 the size of LM test is too high by $0.012$ percentage points at $\left|\rho \right|=0.8$. For $\rho =-0.4$ and $\rho =0.4$ the size-discrepancy is within the Kolmogorov and Smirnov $5\%$ confidence of bound p ± $0.0096$ for p-values smaller than $0.1$. A similar result has also been mentioned in Montes-Rojas (2011) [9] in case of robust LM and $C\left(\alpha \right)$ tests. A weaker exclusion restriction, setting $Var\left({z}_{1i}\right)=1/3$ in Experiment 2, increases the size-discrepancy at high absolute values of ρ (Experiment 2, row 2 of Figure 1), but hardly affects the size of the test at $\left|\rho \right|=0.4$. The size-discrepancy remains in the confidence bounds at medium values of ρ. Increasing the share of unobserved values to 0.49 (Experiment 3, row 3 of Figure 1) hardly affects the size-discrepancy. Lastly, Experiment 4 (last row of Figure 1) shows that a weaker fit ($Var\left({u}_{2,i}\right)=1$) does not result in a larger size distortion as compared to the baseline in the first row of Figure 1. As one would expect, a larger number of observations generally enhances the performance of the LM test (see the last column in Figure 1). However, the large sample approximation improves relatively slowly with sample size under a weak exclusion restriction at high absolute values of ρ (confer the second row of graphs in Figure 1).

Figure 2, Figure 3 and Figure 4 present the power-size plots of the pseudo-score LM test for the DGPs 2–3, 4–5 and 6–7, respectively. In general and in line with the literature, for all DGPs referring to the alternative hypothesis we observe lower power of the pseudo-score LM test at high absolute values of ρ, but especially so at $\rho =-0.8$. If the distribution of the disturbances of the outcome equation exhibits both skewness and excess kurtosis (DGP3) the simulated power of the pseudo-score LM test is higher than that of a symmetric distribution with fatter tails than the normal distribution except for ($\rho =-0.8$). Furthermore, for DGP3 the power is generally lower at $\rho =-0.8$ as compared to large positive values ($\rho =0.8)$, which reflects differences in the skewness of the distribution of ${u}_{2i}$ with respect to ρ (confer Table 1).

Figure 3 illustrates the power of the pseudo-score LM test under non-normality in either the outcome (DGP4) or the selection equation (DGP5) but not in both. Under DGP4 the pseudo-score LM test exhibits high power at intermediate absolute values of ρ, while at high absolute values of ρ the power tends to be lower as the weight of ${u}_{1i}$ (that is assumed to be normal) is higher in the disturbances of the outcome equation. In case of DGP5 we see the reversed pattern. Deviations from normality are only detected in case of high absolute values of ρ. Actually, under DGP5 the test has no power at all at $\rho =0$, since in this case there is no effect of the truncation of ${u}_{i1}$ and disturbances of the outcome equation are normal. This results can be found in all four considered Experiments.

Figure 4 presents the size-power plot of DGP6 and DGP7 and refers to heteroskedasticity. DGP6 allows for heteroskedasticity in the outcome equation and DGP7 in the selection equation. The power-size curves indicate that the pseudo-score LM test is also able to detect this type of deviation from the model assumptions as heteroskedasticity translates into pronounced excess kurtosis of the disturbances of the outcome equation. For DGP6 this is the case at medium to low values of $\left|\rho \right|$. DGP7 introduces heteroskedasticity in Probit selection model. In this case, the LM test exhibits power at high absolute vales of ρ, but has virtual no power at $\rho =-0.4$ and $0.4$. The reason is that the nominal kurtosis of ${u}_{2i}$ is hardly affected (amounting to 3.06, see Table 1) and the bias of the Mills’ ratio and the estimated coefficients of the outcome equation, especially that of the Mills’ ratio turn out small in comparison.

Comparing the first and second row of graphs in Figure 2, Figure 3 and Figure 4 indicates that there is not much power lost with the weaker exclusion restriction. A higher share of unobserved units tends to slightly reduce the power of the LM test as one would expect (see the graphs in row 3 vs. 1 in Figure 2, Figure 3 and Figure 4). Comparing the first and the last row in Figure 2, Figure 3 and Figure 4 indicates that a weaker fit (i.e., $Var\left({u}_{2i}\right)$ is increased from 0.25 to 1) does not result in a significant loss of power. Lastly, as expected a larger sample size improves the power of the pseudo-score LM test across the board.4

Using Meijer and Wansbeek’s (2007) [10] GMM-approach for two-step estimators of the Heckman sample selection model, this paper introduces a pseudo-score LM test to check the assumption of normality and homoskedasticity of the disturbances, a prerequisite for the consistency of this estimator. The GMM-based pseudo-score LM test is easy to calculate and similar to the widely used Jarque and Bera (1980) [11] LM test. Indeed, in the absence of sample selection it reverts to their LM test statistic. In particular, the test checks whether the third and fourth moments of the disturbances of the outcome equation of the Heckman model conform to those implied by the truncated normal distribution. Under ${H}_{0}$ normal disturbances of the selection equation and the absence of heteroskedasticity in both the outcome and the selection equation are additionally required.

Monte Carlo simulations show good performance of the pseudo-score LM test for samples of size 1000 or larger and a powerful exclusion restriction. However, in line with other tests of the normality assumption of the Heckman sample selection model proposed in the literature the pseudo-score LM test tends to be oversized, although only slightly, if the correlation of the disturbances of the selection and the outcome equation is high in absolute value or if the exclusion restrictions are weak. Hence, this test can be recommended for sample sizes of 1000 or larger.

I am very grateful to Tom Wansbeek and two anonymous referees for detailed and constructive comments on an earlier draft. A Stata ado-file for this test is available at: http://homepage.uibk.ac.at/ c43236/publications.html.

Let $Z\sim N(0,1)$ and consider ${\mu}_{k}\left({a}_{i}\right)=E\left[{Z}^{k}\right|Z>{a}_{i}],k=1,2....$ The derivation the moments of ${Z}^{k}|Z>{a}_{i}$ uses the following recursive formula (Meijer and Wansbeek, 2007, p. 45) [10]:

$$\begin{array}{ccc}\hfill {\mu}_{0}\left({a}_{i}\right)& =& 1\hfill \\ \hfill {\mu}_{1}\left({a}_{i}\right)& =& {\lambda}_{i}\hfill \\ \hfill {\mu}_{k}\left({a}_{i}\right)& =& (k-1){\mu}_{k-2}\left({a}_{i}\right)+{a}_{i}^{k-1}{\lambda}_{i},\phantom{\rule{4.pt}{0ex}}k\ge 2\hfill \end{array}$$

Setting ${a}_{i}=-{z}_{i}^{\prime}\gamma $ and abbreviating ${\mu}_{k}\left({a}_{i}\right)={\mu}_{k,i},$ one obtains
and based on these results one can calculate the moments of ${\left(\tau {v}_{i}+{\epsilon}_{i}\right)}^{k}$ as
where ${\mu}_{\epsilon ,k-j}\equiv E\left[{\epsilon}_{i}^{k}\right].$

$${\psi}_{k,i}\equiv E\left[{v}_{i}^{k}\right|{d}_{i}=1]=\sum _{j=0}^{k}{(-1)}^{j}\left(\genfrac{}{}{0pt}{}{k}{j}\right){\mu}_{j,i}{\lambda}_{i}^{k-j}$$

$$\begin{array}{ccc}\hfill {\phi}_{k,i}& \equiv & E\left[{\left(\tau {v}_{i}+{\epsilon}_{i}\right)}^{k}|d=1\right]=E\left[\sum _{j=0}^{k}\left(\genfrac{}{}{0pt}{}{k}{j}\right){\epsilon}_{i}^{k-j}{\left(\tau {v}_{i}\right)}^{j}|d=1\right]\hfill \\ & =& \sum _{j=0}^{k}\left(\genfrac{}{}{0pt}{}{k}{j}\right){\mu}_{\epsilon ,k-j}{\tau}^{j}{\psi}_{j,i}\hfill \end{array}$$

Denoting the GMM-estimates under ${H}_{0}$ by $\tilde{\theta},$ the pseudo-score LM test can be written as (see Hayashi, 2000, p.491–493 [17], Newey and West, 1987, p. 780 [13] and Hall, 2005, p. 162 [14]) : 5
where ${\Psi}_{n}\left(\tilde{\theta}\right)=E\left[\overline{\Psi}\left(\tilde{\theta}\right)\right]$ is a consistent estimator of ${\Psi}_{0}$ under ${H}_{0}$. Using the fact that $\overline{G}\left(\tilde{\theta}\right)$ is invertible yields the LM test statistic as
which can be further simplified using the partitioned inverse
since ${\overline{h}}_{1}\left(\tilde{\theta}\right)=0.$

$$LM=n{\overline{h}}^{\prime}\left(\tilde{\theta}\right){\Psi}_{n}{\left(\tilde{\theta}\right)}^{-1}{\overline{G}}^{\prime}\left(\tilde{\theta}\right){\left({\overline{G}}^{\prime}\left(\tilde{\theta}\right){\Psi}_{n}{\left(\tilde{\theta}\right)}^{-1}\overline{G}\left(\tilde{\theta}\right)\right)}^{-1}\overline{G}\left(\tilde{\theta}\right){\Psi}_{n}{\left(\tilde{\theta}\right)}^{-1}{\overline{h}}^{\prime}\left(\tilde{\theta}\right)$$

$$LM=n{\overline{h}}^{\prime}\left(\tilde{\theta}\right){\Psi}_{n}{\left(\tilde{\theta}\right)}^{-1}\overline{h}\left(\tilde{\theta}\right)$$

$$LM=n{\overline{h}}_{2}{\left(\tilde{\theta}\right)}^{\prime}{\left({\Psi}_{n,22}\left(\tilde{\theta}\right)-{\Psi}_{n,21}\left(\tilde{\theta}\right){\Psi}_{n,11}^{-1}\left(\tilde{\theta}\right){\Psi}_{n,12}\left(\tilde{\theta}\right)\right)}^{-1}{\overline{h}}_{2}\left(\tilde{\theta}\right)$$

Under fairly general conditions (see Amemiya, 1985, Section 3.4) [18], $li{m}_{n\to \infty}E\left[\overline{\Psi}\left(\theta \right)\right]=pli{m}_{n\to \infty}\overline{\Psi}\left(\theta \right)$ and in the formulas for the asymptotic covariance matrix, one can replace $\overline{\Psi}\left(\theta \right)$ by its expectation. Note ${\Psi}_{n}\left({\theta}_{0}\right)=E\left[\overline{\Psi}\left({\theta}_{0}\right)\right]$ can be estimated consistently in the usual way by ${\Psi}_{n}\left(\tilde{\theta}\right)$. To obtain the estimate ${\Psi}_{n}\left(\tilde{\theta}\right)$, we partition ${\Psi}_{n}\left(\theta \right)$ in accordance to $\overline{h}\left(\theta \right)={({\overline{h}}_{1}{\left(\theta \right)}^{\prime},{\overline{h}}_{2}^{\prime}\left(\theta \right))}^{\prime}$ as
Using
one obtains for the off-diagonal elements:
Some of the explanatory variables summarized in ${w}_{i}$ may not be observed at ${d}_{i}=0.$ However, one can use the reasoning in Meijer and Wansbeek (2007) [10] and establish
Here, $\Pi =diag({p}_{1},..,{p}_{n})$ and ${W}_{1}$ is derived from W by skipping all rows with ${d}_{i}=0.$ Hence, one can use
where $V=diag\left(\frac{{\varphi}_{1}^{2}}{{p}_{1}(1-{p}_{1})},..,\frac{{\varphi}_{n}^{2}}{{p}_{n}(1-{p}_{n})}\right),{W}_{n\times {k}_{2}}={({w}_{1},...,{w}_{n})}^{\prime},$ and $\Sigma =diag({\phi}_{2,1},...,{\phi}_{2,n}).$ ${\Sigma}_{1}$ is obtained from Σ by deleting all rows and columns referring to ${d}_{i}=0$, and similarly ${W}_{1}.$ Similar arguments yield at $\xi =\kappa =0$
and

$${\Psi}_{n}\left(\theta \right)=\left[\begin{array}{cc}{\Psi}_{n,11}\left(\theta \right)& {\Psi}_{n,12}\left(\theta \right)\\ {\Psi}_{n,12}{\left(\theta \right)}^{\prime}& {\Psi}_{n,22}\left(\theta \right)\end{array}\right]$$

$$\begin{array}{c}{h}_{1,i}\left(\theta \right){h}_{1,i}{\left(\theta \right)}^{\prime}=\hfill \\ \left[\begin{array}{c}\left({d}_{i}-{p}_{i}\right)\frac{{\varphi}_{i}{z}_{i}}{{p}_{i}(1-{p}_{i})}\\ {d}_{i}{w}_{i}(\tau {v}_{i}+{\epsilon}_{i})\\ {d}_{i}\left[{\left(\tau {v}_{i}+{\epsilon}_{i}\right)}^{2}-{\phi}_{2,i}\right]\end{array}\right]\left[\begin{array}{ccc}\frac{\left({d}_{i}-{p}_{i}\right){\varphi}_{i}{z}_{i}}{{p}_{i}(1-{p}_{i})}& {d}_{i}{w}_{i}^{\prime}(\tau {v}_{i}+{\epsilon}_{i})& {d}_{i}\left[{\left(\tau {v}_{i}+{\epsilon}_{i}\right)}^{2}-{\phi}_{2,i}\right]\end{array}\right]=\\ \hfill \left[\begin{array}{ccc}{\left({d}_{i}-{p}_{i}\right)}^{2}{\left(\frac{{\varphi}_{i}}{{p}_{i}(1-{p}_{i})}\right)}^{2}{z}_{i}{z}_{i}^{\prime}& \frac{{d}_{i}\left({d}_{i}-{p}_{i}\right){\varphi}_{i}}{{p}_{i}(1-{p}_{i})}{z}_{i}{w}_{i}^{\prime}(\tau {v}_{i}+{\epsilon}_{i})& \frac{\left({d}_{i}-{p}_{i}\right){\varphi}_{i}}{{p}_{i}(1-{p}_{i})}{d}_{i}\left[{\left(\tau {v}_{i}+{\epsilon}_{i}\right)}^{2}-{\phi}_{2,i}\right]{z}_{i}\\ *& {d}_{i}{w}_{i}{w}_{i}^{\prime}{(\tau {v}_{i}+{\epsilon}_{i})}^{2}& {d}_{i}\left[{\left(\tau {v}_{i}+{\epsilon}_{i}\right)}^{3}-{\phi}_{2,i}(\tau {v}_{i}+{\epsilon}_{i})\right]{w}_{i}\\ *& *& {d}_{i}{\left[{\left(\tau {v}_{i}+{\epsilon}_{i}\right)}^{2}-{\phi}_{2,i}\right]}^{2}\end{array}\right]\end{array}$$

$$\begin{array}{ccc}\hfill \frac{1}{n}\sum _{i=1}^{n}E\left[\left({d}_{i}-{p}_{i}\right){\textstyle \frac{{\varphi}_{i}}{{p}_{i}(1-{p}_{i})}}{d}_{i}(\tau {v}_{i}+{\epsilon}_{i}){z}_{i}{w}_{i}^{\prime}\right]& =& 0\hfill \\ \hfill \frac{1}{n}\sum _{i=1}^{n}E[{\textstyle \frac{\left({d}_{i}-{p}_{i}\right){\varphi}_{i}}{{p}_{i}(1-{p}_{i})}}{d}_{i}\left[{\left(\tau {v}_{i}+{\epsilon}_{i}\right)}^{2}-{\phi}_{2,i}\right]{z}_{i}& =& 0\hfill \\ \hfill \frac{1}{n}\sum _{i=1}^{n}E\left[{d}_{i}\left[{\left({\epsilon}_{i}+\tau {v}_{i}\right)}^{3}-{\phi}_{2,i}(\tau {v}_{i}+{\epsilon}_{i})\right]{w}_{i}\right]& =& \frac{1}{n}\sum _{i=1}^{n}{p}_{i}{\phi}_{3,i}{w}_{i}\hfill \end{array}$$

$$\begin{array}{ccc}\hfill pli{m}_{n\to \infty}{\textstyle \frac{1}{n}}\left({W}_{1}^{\prime}{W}_{1}\right)-\underset{n\to \infty}{lim}{\textstyle \frac{1}{n}}{W}^{\prime}\Pi W& =& pli{m}_{n\to \infty}{\textstyle \frac{1}{n}}\sum _{i=1}^{n}{d}_{i}{w}_{i}{w}_{i}^{\prime}-\underset{n\to \infty}{lim}{\textstyle \frac{1}{n}}\sum _{i=1}^{n}{p}_{i}{w}_{i}{w}_{i}^{\prime}=0\hfill \\ \hfill pli{m}_{n\to \infty}{\textstyle \frac{1}{n}}\sum _{i=1}^{n}\left({d}_{i}-{p}_{i}\right){\phi}_{k,i}{w}_{i}& =& 0,\phantom{\rule{2.em}{0ex}}k=1,...,8\hfill \\ \hfill pli{m}_{n\to \infty}{\textstyle \frac{1}{n}}\sum _{i=1}^{n}\left({d}_{i}-{p}_{i}\right){\phi}_{k,i}{\phi}_{l,i}& =& 0,\phantom{\rule{2.em}{0ex}}k,l=1,...,4.\phantom{\rule{4.pt}{0ex}}\hfill \end{array}$$

$${\Psi}_{n,11}\left(\theta \right)=\frac{1}{n}\left[\begin{array}{ccc}{Z}^{\prime}VZ& 0& 0\\ *& {W}_{1}^{\prime}{\Sigma}_{1}{W}_{1}& \sum _{{d}_{i}=1}{w}_{i}{\phi}_{3,i}\\ *& *& \sum _{{d}_{i}=1}\left({\phi}_{4,i}-{\phi}_{2,i}^{2}\right)\end{array}\right]$$

$${\Psi}_{n,22}\left(\theta \right)=\frac{1}{n}\sum _{i=1}^{N}\left[\begin{array}{cc}{p}_{i}\left({\phi}_{6,i}-{\phi}_{3,i}^{2}\right)& {p}_{i}\left({\phi}_{7,i}-{\phi}_{3,i}{\phi}_{4,i}\right)\\ {p}_{i}\left({\phi}_{7,i}-{\phi}_{3,i}{\phi}_{4,i}\right)& {p}_{i}\left({\phi}_{8,i}-{\phi}_{4,i}^{2}\right)\end{array}\right]$$

$${\Psi}_{n,12}\left(\theta \right)=\frac{1}{n}\sum _{i=1}^{n}\left[\begin{array}{cc}0& 0\\ {p}_{i}{w}_{i}{\phi}_{4,i}& {p}_{i}{w}_{i}{\phi}_{5,i}\\ {p}_{i}\left({\phi}_{5,i}-{\phi}_{2,i}{\phi}_{3,i}\right)& {p}_{i}\left({\phi}_{6,i}-{\phi}_{4,i}{\phi}_{2,i}\right)\end{array}\right]$$

Again, we can insert ${d}_{i}$ fir ${p}_{i}$. Applying the formula for the partitioned inverse yields the simplification of the pseudo-score LM test statistic:
which is asymptotically distributed as ${\chi}^{2}\left(2\right)$ under ${H}_{0}.$

$$\begin{array}{ccc}\hfill LM& =& n{\overline{h}}^{\prime}\left(\tilde{\theta}\right){\Psi}_{n}{\left(\tilde{\theta}\right)}^{-1}{\overline{h}}^{\prime}\left(\tilde{\theta}\right)\hfill \\ & =& n\left[0,{\overline{h}}_{2}{\left(\tilde{\theta}\right)}^{\prime}\right]{\left[\begin{array}{cc}{\Psi}_{n,11}\left(\tilde{\theta}\right)& {\Psi}_{n,12}\left(\tilde{\theta}\right)\\ {\Psi}_{n,21}\left(\tilde{\theta}\right)& {\Psi}_{n,22}\left(\tilde{\theta}\right)\end{array}\right]}^{-1}\left[\begin{array}{c}0\\ {\overline{h}}_{2}\left(\tilde{\theta}\right)\end{array}\right]\hfill \\ & =& n{\overline{h}}_{2}{\left(\tilde{\theta}\right)}^{\prime}{\left({\Psi}_{n,22}\left(\tilde{\theta}\right)-{\Psi}_{n,21}\left(\tilde{\theta}\right){\Psi}_{n,11}{\left(\tilde{\theta}\right)}^{-1}{\Psi}_{n,12}\left(\tilde{\theta}\right)\right)}^{-1}{\overline{h}}_{2}\left(\tilde{\theta}\right)\hfill \end{array}$$

The author declares no conflict of interest.

- S.T. Yen, and J. Rosinski. “On the marginal effects of variables in the log-transformed sample selection models.” Econ. Lett. 100 (2008): 4–8. [Google Scholar] [CrossRef]
- K.E. Staub. “A causal interpretation of extensive and intensive margin effects in generalized Tobit models.” Rev. Econ. Stat. 96 (2014): 371–375. [Google Scholar] [CrossRef]
- W.K. Newey. “Two-step series estimation of sample selection models.” Econom. J. 12 (2009): 217–229. [Google Scholar] [CrossRef]
- C.L. Skeels, and F. Vella. “A Monte Carlo investigation of the sampling behavior of conditional moment tests in Tobit and Probit models.” J. Econom. 92 (1999): 275–294. [Google Scholar] [CrossRef]
- D.M. Drukker. “Bootstrapping a conditional moments test for normality after Tobit estimation.” Stata J. 2 (2002): 125–139. [Google Scholar]
- A.K. Bera, C.M. Jarque, and L.-F. Lee. “Testing the normality assumption in limited dependent variable models.” Int. Econ. Rev. 25 (1984): 563–578. [Google Scholar] [CrossRef]
- L.-F. Lee. “Tests for the bivariate normal distribution in econometric models with selectivity.” Econometrica 52 (1984): 843–863. [Google Scholar] [CrossRef]
- B. Van der Klaauw, and R.H. Koning. “Testing the normality assumption in the sample selection model with and application to travel demand.” J. Bus. Econ. Stat. 21 (1993): 31–42. [Google Scholar] [CrossRef]
- G.V. Montes-Rojas. “Robust misspecification tests for the Heckman’s two-step estimator.” Econom. Rev. 30 (2011): 154–172. [Google Scholar] [CrossRef]
- E. Meijer, and T. Wansbeek. “The sample selection model from a method of moments perspective.” Econom. Rev. 26 (2007): 25–51. [Google Scholar] [CrossRef]
- C. Jarque, and A. Bera. “Efficient tests for normality, homoskedasticity and serial independence of regression residuals.” Econ. Lett. 6 (1980): 255–259. [Google Scholar] [CrossRef]
- J.J. Heckman. “Sample selection bias as a specification error.” Econometrica 47 (1979): 153–161. [Google Scholar] [CrossRef]
- W.K. Newey, and K.D. West. “Hypothesis testing with efficient method of moments estimation.” Int. Econ. Rev. 28 (1987): 777–787. [Google Scholar] [CrossRef]
- A.R. Hall. Generalized Methods of Moments. Oxford, UK: Oxford University Press, 2005. [Google Scholar]
- S.C. Ahn, and P. Schmidt. “A separability result for GMM estimation, with applications to GLS prediction and conditional Moment Tests.” Econom. Rev. 14 (1995): 19–34. [Google Scholar] [CrossRef]
- R. Davidson, and J.G. MacKinnon. “Graphical methods for investigating the size and power of hypothesis tests.” Manch. Sch. 66 (1998): 1–26. [Google Scholar] [CrossRef]
- F. Hayashi. Econometrics. Princeton, NJ, USA; Oxford, UK: Princeton University Press, 2000. [Google Scholar]
- T. Amemiya. Advanced Econometrics. Harvard, UK: Harvard University Press, 1985. [Google Scholar]

^{1.}An example is the estimation of gravity models of bilateral trade flows with missing and/or zero trade. Here, the assumption of bivariate normality turns out important for deriving comparative static results with respect changes in the external and internal margin of trade following Yen and Rosinski (2008) [1] and Staub (2014) [2].^{3.}Specifically, Montes-Rojas (2011)[9] mentions the case where ${u}_{1i}\sim N(0,1)$, ${u}_{2i}=\tau {u}_{1i}+{\epsilon}_{i}$ and ${u}_{1i}$ and ${\epsilon}_{i}$ being independent, but ${\epsilon}_{i}$ does not follow a normal distribution. ${\phi}_{3,i}$ and ${\phi}_{4,i}$ the moments are $E\left[{\epsilon}_{i}^{k}\right]$ are left unrestricted and estimated from the residuals of the second-stage outcome equation.^{4.}The corresponding figures for a larger sample size of n = 2000 are available upon request from the author.^{5.}Newey and West (1987) [13] propose to use the unrestricted estimator $\overline{\Psi}\left(\hat{\theta}\right)$, a route that is not followed here.

© 2014 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/).