A GMM-Based Test for Normal Disturbances of the Heckman Sample Selection Model

Pfaffermayr, Michael

doi:10.3390/econometrics2040151

Open AccessArticle

A GMM-Based Test for Normal Disturbances of the Heckman Sample Selection Model

by

Michael Pfaffermayr

^1,2

¹

Department of Economics, University of Innsbruck, Universitaetsstrasse 15, Innsbruck 6020, Austria

²

Austrian Institute of Economic Research, P.O.-Box 91, Vienna A-1103, Austria

Econometrics 2014, 2(4), 151-168; https://doi.org/10.3390/econometrics2040151

Submission received: 31 July 2014 / Revised: 10 October 2014 / Accepted: 14 October 2014 / Published: 23 October 2014

Download

Browse Figures

Versions Notes

Abstract

:

The Heckman sample selection model relies on the assumption of normal and homoskedastic disturbances. However, before considering more general, alternative semiparametric models that do not need the normality assumption, it seems useful to test this assumption. Following Meijer and Wansbeek (2007), the present contribution derives a GMM-based pseudo-score LM test on whether the third and fourth moments of the disturbances of the outcome equation of the Heckman model conform to those implied by the truncated normal distribution. The test is easy to calculate and in Monte Carlo simulations it shows good performance for sample sizes of 1000 or larger.

Keywords:

sample selection model; GMM; normality; pseudo-score LM test

JEL classifications:

C23; C21

1. Introduction

The assumption of bivariate normal and homoskedastic disturbances is a prerequisite for the consistency of the maximum likelihood estimator of the Heckman sample selection model. Moreover, some studies focus on the prediction of counterfactuals based on the Heckman sample selection model taking into account both changes in participation and outcome, which is often only feasible under the assumption of bivariate normality.1 Lastly, under the assumption of bivariate normality one can the estimate the Heckman sample selection model by maximum likelihood methods that are less sensitive to weak exclusion restrictions.

Before employing alternative semiparametric estimators that do not need the normality assumption (see e.g., Newey, 2009 [3]), it seems useful to test the underlying normality assumption of sample selection models. So far, the literature offers several approaches to test this hypothesis.2 Bera et. al., (1984) [6] develop an LM test for normality of the disturbances in the general Pearson framework, which implies testing the moments up to order four. Lee (1984) [7] proposes Lagrangian multiplier tests within the bivariate Edgeworth series of distributions. Van der Klaauw and Koning (1993) [8] derive LR tests in a similar setting, while Montes-Rojas (2011) [9] proposes LM and

C (α)

tests that are likewise based on bivariate Edgeworth series expansions, but robust to local misspecification in nuisance distributional parameters. In general, these approaches tend to lead to complicated test statistics that are sometimes difficult to implement in standard econometric software. More importantly, some of these tests for bivariate normality seem to exhibit unsatisfactory performance in Monte Carlo simulations and are rejected too often in small to medium samples sizes, especially if the parameter of the Mills’ ratio is high in absolute value (see e.g., Montes-Rojas, 2011 [9], Table 1). This motivates Montes-Rojas (2011) [9] to focus on the assumptions of the two-step estimator that requires less restrictive assumptions, namely a normal marginal distribution of the disturbances of the selection equation and a linear conditional expectation of the disturbances of the outcome equation. He proposes to test for marginal normality and linearity of the conditional expectation of outcome model separately and shows that the corresponding locally size-robust test statistics based on the two-step estimator perform well in terms size and power .

In a possibly neglected, but very valuable paper, Meijer and Wansbeek (2007) [10] embed the two-step estimator of the Heckman sample selection model in a GMM-framework. In addition, they argue that within this framework it is easily possible to add moment conditions for designing Wald tests in order to check the assumption of bivariate normality and homoskedasticity of the disturbances. Their approach does not attempt to develop a most powerful test, rather they intended to design a relatively simple test for normality that can be used as an alternative to the existing tests. The test can be interpreted as a conditional moment test and checks whether the third and fourth moments of the disturbances of the outcome equation of the Heckman model conform to those implied by the truncated normal distribution. For

H_{0}

to hold, the test in addition requires normally distributed disturbances of the selection equation and the absence of heteroskedasticity in both the outcome and the selection equation.

Meijer and Wansbeek (2007) [10] do not explicitly derive the corresponding test statistic nor do they provide Monte Carlo simulations on its performances in finite samples. The present contribution takes up their approach arguing that a GMM based pseudo-score LM test is well suited to test the hypothesis of bivariate normality and is easy to calculate. The derived LM test is similar to the widely used Jarque and Bera LM test (1980) [11], and in the absence of sample selection reverts to their LM test statistic. Monte Carlo simulations show good performance of the proposed test for samples of sizes of 1000 or larger, especially if a powerful exclusion restriction is available.

2. The GMM Based Pseudo-Score LM Test for Normality

In a cross-section of n units the Heckman (1979) [12] sample selection model is given as

\begin{matrix} y_{1 i}^{*} & = & z_{i}^{'} γ + u_{1 i} \\ y_{2 i}^{*} & = & x_{i}^{'} β + u_{2 i} \\ d_{i} & = & \{\begin{matrix} 1 & if y_{1 i}^{*} > 0 \\ 0 & otherwise \end{matrix} \\ y_{2 i} & = & \{\begin{matrix} y_{2 i}^{*} & if d_{i} = 1 \\ unobserved & if d_{i} = 0 \end{matrix} \end{matrix}

where

y_{1 i}^{*}

and

y_{2 i}^{*}

denote latent random variables. The outcome variable,

y_{2 i}^{*},

is observed if the latent variable

y_{1 i}^{*} > 0

or, equivalently, if

d_{i} = 1

.

z_{i}

is a

k_{1} \times 1

vector containing the exogenous variables of the selection equation and

x_{i}

is the

k_{2} \times 1

vector of the exogenous variables of the outcome equation.

z_{i}

may include the variables in

x_{i},

but also additional ones so that an exclusion restriction holds. γ and β denote the corresponding parameter vectors. Under

H_{0}

the disturbances are assumed to be distributed as bivariate normal, i.e.,

([\begin{matrix} u_{1 i} \\ u_{2 i} \end{matrix}] | x_{i}, z_{i}) \sim N ([\begin{matrix} 0 \\ 0 \end{matrix}], [\begin{matrix} 1 & τ \\ τ & σ^{2} \end{matrix}])

It is easy to show that under these assumptions

\begin{matrix} p_{i} & \equiv & E [d_{i}] = Φ (z_{i}^{'} γ) \\ λ_{i} & \equiv & E [u_{1 i} | u_{1 i} \geq - z_{i}^{'} γ] = \frac{ϕ (- z_{i}^{'} γ)}{1 - Φ (- z_{i}^{'} γ)} = \frac{ϕ (z_{i}^{'} γ)}{Φ (z_{i}^{'} γ)} \end{matrix}

where

λ_{i}

denotes the inverse Mills’ ratio. Under the normal assumption one can specify

u_{2 i} = τ u_{1 i} + ε_{i}

so that

ε_{i} \sim i i d N (0, σ^{2} - τ^{2})

.

ε_{i}

is independent of

u_{1 i}

as

E [ε_{i} u_{1 i}] = E [(u_{2 i} - τ u_{1 i}) u_{1 i}] = τ - τ = 0

. Since

E [u_{1 i} | d_{i} = 1] = λ_{i}

, it holds that

E [τ (u_{1 i} - λ_{i}) + ε_{i} | d_{i} = 1] = 0

. Therefore, the two-step Heckman sample selection model includes the estimated inverse Mills’ ratio in the outcome equation as an additional regressor. For the observed outcome at

d_{i} = 1

the model can be written as

\begin{matrix} y_{2 i}^{*} & = & x_{i}^{'} β + τ λ_{i} + τ (u_{1 i} - λ_{i}) + ε_{i} \\ \equiv & w_{i}^{'} α + τ v_{i} + ε_{i} \end{matrix}

(1)

where

v_{i} = u_{1 i} - λ_{i}

and

E [y_{2 i}^{*} | d_{i} = 1] = x_{i}^{'} β + τ λ_{i} .

Meijer and Wansbeek (2007) [10] embed the two-step Heckman sample selection estimator in a GMM framework and demonstrate that the estimation can be based on

\begin{matrix} h_{1, 1 i (k_{1} \times 1)} (θ_{1}) & \equiv & \frac{(d_{i} - p_{i}) ϕ_{i}}{p_{i} (1 - p_{i})} z_{i} \\ h_{1, 2 i (k_{2} \times 1)} (θ_{1}) & \equiv & d_{i} w_{i} (y_{i} - w_{i}^{'} α) = d_{i} w_{i} (τ v_{i} + ε_{i}) \\ h_{1, 3 i (1 \times 1)} (θ_{1}) & \equiv & d_{i} [{(y_{i} - w_{i}^{'} α)}^{2} - φ_{2, i}] = d_{i} [{(ε_{i} + τ v_{i})}^{2} - φ_{2, i}] \end{matrix}

where

θ_{1} = {(γ^{'}, β^{'}, τ, σ)}^{'}

and

φ_{k, i} = E [{(τ v_{i} + ε_{i})}^{k} | d_{i} = 1], k = 2, 3, 4 .

Note, there are as many parameters as moment conditions and the model is just-identified.

The first set of moment equations is based on

{\bar{h}}_{1, 1} (θ_{1}) = \frac{1}{n} \sum_{i = 1}^{n} h_{1, 1 i} (θ_{1})

and refers to the score of the Probit model. Since these moment conditions do not include the parameters entering

h_{1, 2 i}

and

h_{1, 3 i}

(i.e.,

β^{'}, τ, σ

) and are exactly identified, estimation can proceed in steps: In the first step, one can solve

\frac{1}{n} \sum_{i = 1}^{n} h_{1, 1 i} (\hat{γ}) = 0

and in the second step one solves the sample moment condition

{\bar{h}}_{1, 2} (θ_{1}) = \frac{1}{n} \sum_{i = 1}^{n} [h_{1, 2 i} ({\hat{γ}}^{'}, {\hat{β}}^{'}, \hat{τ})] = 0

using the estimated

\hat{γ}

derived in the first stage. This leads to the two-step Heckman estimator, which first estimates a Probit model, inserts the estimated Mills’ ratio

{\hat{λ}}_{i}

as additional regressor in the outcome equation and applies OLS. Lastly, from

{\bar{h}}_{1, 3} (θ_{1}) = \frac{1}{n} \sum h_{1, 3 i} ({\hat{γ}}^{'}, {\hat{β}}^{'}, \hat{τ}, \hat{σ}) = 0

one can obtain an estimator of

σ^{2} .

As Meijer and Wansbeek (2007) [10] remark, a rough and simple test for normality can be based on two additional moment conditions that allow comparing the third and fourth moments of the estimated residuals of the outcome equation,

y_{i} - w_{i}^{'} \hat{α}

, with their theoretical counterparts based on the truncated normal distribution. These moment conditions use

\begin{matrix} h_{2, 1 i (1 \times 1)} (θ_{1}, θ_{2}) & \equiv & d_{i} [{(y_{i} - w_{i}^{'} α)}^{3} - φ_{3, i} - ξ] = d_{i} [{(τ v_{i} + ε_{i})}^{3} - φ_{3, i} - ξ] \\ h_{2, 2 i (1 \times 1)} (θ_{1}, θ_{2}) & \equiv & d_{i} [{(y_{i} - w_{i}^{'} α)}^{4} - φ_{4, i} - κ] = d_{i} [{(τ v_{i} + ε_{i})}^{4} - φ_{4, i} - κ] \end{matrix}

Thereby,

θ_{2} = (ξ, κ)

denotes additional parameters that are zero under normality. More importantly, under

H_{0}

the expectations

φ_{k, i}

can be derived recursively from the moments of the truncated normal distribution as shown in the Appendix (see alsoMeijer and Wansbeek, 2007, pp. 45–46) [10]. In general, these moments depend on the parameters

θ_{1}

and, especially, on the inverse Mills’ ratio

λ_{i}

and the parameter τ.

To detect violations of the normality assumption, one can test

H_{0} : ξ = 0

and

κ = 0

vs.

H_{0} : ξ \neq 0

and/or

κ \neq 0 .

Although this hypothesis checks the third and fourth moments of the disturbances of the two-step outcome Equation (1), it can only be true if

φ_{3, i}

and

φ_{4, i}

are the correct expected values. Therefore, the test additionally requires the moment conditions

E [h_{1, 1 i}] = 0

and

E [h_{1, 2 i} = 0]

to hold so that the parameters of both the selection equation and the outcome equation are consistently estimated. The present hypothesis is somewhat more restrictive than that tested, e.g., in Montes-Rojas (2011) [9], who emphasizes that the Heckman two-step estimator is robust to distributional misspecification if (i) the marginal distribution of

u_{1 i}

is normal and (ii)

E [u_{2 i} | u_{1 i}] = τ u_{1 i}

, i.e., the conditional expectation is linear.3

In addition,

H_{0}

also requires the absence of heteroskedasticity ( see Meijer and Wansbeek, 2007, p. 46 ) [10]. To give an example, assume that

u_{2 i} = τ u_{1 i} + ε_{i}

and

u_{2 i}

are bivariate normal, but the variances of

ε_{i}

differ across i and are given as

σ_{i}^{2} - τ^{2}

(see also the DGP6 in the Monte Carlo set-up below and the excess kurtosis of DGP6 in Table 1 below). Then, it follows that

φ_{k, i} = \sum_{j = 0}^{k} (\binom{k}{j}) E [ε_{i}^{k - j}] τ^{j} ψ_{j, i},

where

ψ_{k, i} \equiv E [(v_{i}^{k} | d_{i} = 1] = \sum_{j = 0}^{k} {(- 1)}^{j} (\binom{k}{j}) μ_{j, i} λ_{i}^{k - j}

and

μ_{j, i} = E [u_{1 i}^{k} | u_{1 i} > - z_{i}^{'} γ]

(see the Appendix). In this case, we have

E [ε_{i}^{2}] = σ_{i}^{2} - τ^{2}

and

E [ε_{i}^{4}] = 3 {(σ_{i}^{2} - τ^{2})}^{2}

, while the corresponding uneven moments are zero. Hence, under heteroskedasticity

φ_{4, i}

differs from that obtained under

H_{0}

which assumes

E [ε_{i}^{2}] = σ^{2} - τ^{2}

and

E [ε_{i}^{4}] = 3 {(σ^{2} - τ^{2})}^{2}

and the population moment condition

E [h_{2, 2 i} (θ_{1}, θ_{2})] = 0

is violated. Hence, a test based on these moments should also be able to detect heteroskedasticity, although not in the most efficient way.

Applying a pseudo-score LM test (Newey and West, 1987 [13]; Hall, 2005 [14]), in this GMM-framework leads to a

χ^{2} (2)

-test statistic that can be calculated easily. In order to derive the LM test statistic, define

\bar{h} (θ) = \frac{1}{n} \sum_{i = 1}^{n} h_{i} (θ)

and

\bar{Ψ} (θ) = \frac{1}{n} \sum_{i = 1}^{n} h_{i} (θ) h_{i} {(θ)}^{'}

, where

h_{i} (θ) \equiv {(h_{1, 1 i}^{'}, h_{1, 2 i}^{'}, h_{1, 3 i}, h_{2, 1 i}, h_{2, 2 i})}^{'}

. It is assumed that

Ψ_{0}

=

p l i m_{n \to \infty} \bar{Ψ} (θ_{0})

exists, is positive definite and invertible. Under standard assumptions, it holds that

\begin{matrix} n^{1 / 2} \bar{h} (θ_{0}) \overset{d}{\to} N (0, Ψ_{0}) \\ n^{1 / 2} (\hat{θ} - θ_{0}) \overset{d}{\to} N (0, A_{0}) \end{matrix}

where the subscript 0 indicates that

H_{0}

is assumed. Thereby,

A_{0} = G_{0}^{- 1} Ψ_{0} {(G_{0}^{- 1})}^{'}

and

G_{0}

is the probability limit of

\bar{G} (θ_{0}) = \frac{1}{n} \sum_{i = 1}^{n} {\frac{\partial h (θ)}{\partial θ}|}_{θ = θ_{0}}

. Note,

\bar{G} (θ_{0})

is invertible as the model is just-identified.

Under

H_{0}

the moment conditions

E [h_{2, i} (θ_{1}, θ_{2}), ξ, κ], h_{2, i} (θ_{1}, θ_{2}) \equiv {(h_{2, 1 i}, h_{2, 2 i})}^{'}

referring to the third and fourth moments of the outcome equation are zero at

ξ = 0

and

κ = 0

and the separability result in Ahn and Schmidt (1995, Section 4) [15] can be applied. Denoting the restricted estimates under

H_{0}

by a tilde, using the invertibility of

\bar{G} (\tilde{θ})

and the partitioned inverse of

Ψ_{n} (\tilde{θ}) = E [\bar{Ψ} (\tilde{θ})]

, the pseudo-score LM test statistic can be derived as (see the Appendix for details):

L M (\tilde{θ}) = n {\bar{h}}_{2}^{'} (\tilde{θ}) {(Ψ_{n, 22} (\tilde{θ}) - Ψ_{n, 21} (\tilde{θ}) Ψ_{n, 11} {(\tilde{θ})}^{- 1} Ψ_{n, 12} (\tilde{θ}))}^{- 1} {\bar{h}}_{2} (\tilde{θ})

Thereby,

{\bar{h}}_{2} (θ) = \frac{1}{n} \sum_{i = 1}^{n} h_{2, i} (θ)

and we use

{\bar{h}}_{1}^{'} (\tilde{θ}) = \frac{1}{n} \sum_{i = 1}^{n} h_{1, i} (\tilde{θ}) = 0,

where

h_{1, i} (θ) = {(h_{1, 1 i}^{'}, h_{1, 2 i}^{'}, h_{1, 3 i})}^{'}

, as well as the partitioned inverse (see the Appendix)

\begin{matrix} Ψ_{n, 11} (θ) & = & \frac{1}{n} [\begin{matrix} Z^{'} V Z & 0 & 0 \\ * & W_{1}^{'} Σ_{1} W_{1} & \sum_{d_{i} = 1} w_{i} φ_{3, i} \\ * & * & \sum_{d_{i} = 1} (φ_{4, i} - φ_{2, i}^{2}) \end{matrix}] \\ Ψ_{n, 22} (θ) & = & \frac{1}{n} \sum_{i = 1} [\begin{matrix} p_{i} (φ_{6, i} - φ_{3, i}^{2}) & p_{i} (φ_{7, i} - φ_{3, i} φ_{4, i}) \\ p_{i} (φ_{7, i} - φ_{3, i} φ_{4, i}) & p_{i} (φ_{8, i} - φ_{4, i}^{2}) \end{matrix}] \\ Ψ_{n, 12} (θ) & = & \frac{1}{n} \sum_{i = 1}^{n} [\begin{matrix} 0 & 0 \\ p_{i} w_{i} φ_{4, i} & p_{i} w_{i} φ_{5, i} \\ p_{i} (φ_{5, i} - φ_{2, i} φ_{3, i}) & p_{i} (φ_{6, i} - φ_{4, i} φ_{2, i}) \end{matrix}] \end{matrix}

where

V = d i a g (\frac{ϕ_{1}^{2}}{p_{1} (1 - p_{1})}, . ., \frac{ϕ_{n}^{2}}{p_{n} (1 - p_{n})}), p_{i} = P (d_{i} = 1), Z_{n \times k_{1}} = {(z_{1}, . . ., z_{n})}^{'}, W_{n \times k_{2}} = {(w_{1}, . . ., w_{n})}^{'},

and

Σ = d i a g (φ_{2, 1}, . . ., φ_{2, n})

.

Σ_{1}

is obtained from Σ by deleting all rows and columns referring to

d_{i} = 0

, and similarly

W_{1} .

Ψ_{n} (θ)

can be consistently estimated by plugging in

\tilde{θ}

. In addition, Meijer and Wansbeek (2007) [10] show that one can substitute

d_{i}

for

p_{i}

so that only information on the observed units is necessary. Note however, the summation runs over all observations (zero and ones in

d_{i}

).

Under standard assumptions it follows that under

H_{0}

we have

L M (\tilde{θ}) \overset{d}{\to} χ^{2} (2)

(see Newey and West, 1987, pp. 781–782 [13] and Theorems 5.6 and 5.7 in Hall, 2005 [14]) . In the absence of sample selection (

τ = 0

) it holds that

φ_{3, i} = φ_{5, i} = 0

, while

φ_{2, i} = σ^{2}

and

φ_{4, i} = 3 σ^{4}

and the LM test statistic reverts to that of Jarque and Bera (1980) [11].

3. Monte Carlo Simulation

Monte Carlo simulations may shed light on the performance of the proposed LM test in finite samples. It is based on a design that has been used previously by van der Klaauw and Koning (1993) [8] and Montes-Rojas (2011) [9], but includes a few modifications. The simulated model is specified as

\begin{matrix} y_{1 i}^{*} & = & - 1 z_{1 i} + 1 x_{2 i} - 1 + u_{1 i} \\ y_{2 i}^{*} & = & 0.5 x_{1 i} - 0.5 x_{2 i} + 1 + u_{2 i} \end{matrix}

where for

ρ \in {- 0.8, - 0.4, 0.4, 0.8}

and

σ^{2} \in {0.25, 1} .

The explanatory variables

x_{1 i}, x_{2 i}

, and

z_{1 i}

are generated as

i i d N (0, 3),

N (0, 3)

and

U (- 3, 3),

respectively. With respect to the disturbances,

u_{1 i}

, and

u_{2 i}

the following data generating processes are considered. Note DGP1-DGP3 imply

V a r [u_{1 i}] = 1

and

V a r [u_{2 i}] = 0.25 .

In contrast, van der Klaauw and Koning (1993) [8] and Montes-Rojas (2011) [9] consider the case with

V a r [u_{2 i}] = 5

and thus receive less precise estimates of the slope parameters of the outcome equation.

DGP1:
$(u_{1 i}, u_{2 i}) \sim i i d N (0, [\begin{matrix} 1 & 0.5 ρ \\ 0.5 ρ & 0.25 \end{matrix}])$
DGP2:
$ε_{1 i} \sim t (10), ε_{2 i} \sim t (10), ε_{1 i}$ and $ε_{2 i}$ being independent.
$u_{1 i} = ε_{1 i} {(\frac{10}{8})}^{- 1 / 2}$
$u_{2 i} = σ {(1 + ρ^{2})}^{1 / 2} {(\frac{10}{8})}^{- 1 / 2} ε_{2 i} + ρ σ u_{1 i}$
The degrees of freedom are set to 10 to guarantee that the moments up to order 4 exists.
DGP3:
$ε_{1 i} \sim χ^{2} (20), ε_{2 i} \sim χ^{2} (30)$
$u_{1 i} = (ε_{1 i} - 20) / \sqrt{40} - 20$
$u_{2 i} = σ {(1 + ρ^{2})}^{1 / 2} (ε_{2 i} - 30) / \sqrt{60} + ρ σ u_{1 i}$
DGP4:
$ε_{1 i} \sim N (0, 1), ε_{2 i} \sim χ^{2} (30)$ and are independent.
$u_{1 i} = ε_{1 i}$
$u_{2 i} = σ {(1 + ρ^{2})}^{1 / 2} (ε_{2 i} - 30) / \sqrt{60} + ρ σ u_{1 i}$
DGP5:
$ε_{1 i} \sim χ^{2} (20), ε_{3 i} \sim N (0, 1)$ and are independent.
$u_{1 i} = (ε_{1 i} - 20) / \sqrt{40} - 1$
$u_{2 i} = σ {(1 + ρ^{2})}^{1 / 2} ε_{2 i} + ρ σ u_{1 i}$
DGP6:
$ε_{1 i} \sim N (0, 1), ε_{2 i} \sim N (0, 0.25), ε_{1 i}$ and $ε_{2 i}$ being independent.
$c_{i} = 1 + e^{\frac{x_{1 i}}{\sqrt{3}}} (e^{- \frac{1}{2}} - 1) {(e^{1} - 1)}^{- \frac{1}{2}}$
$u_{1 i} = ε_{1 i}$
$u_{2 i} = σ {(1 + ρ^{2})}^{1 / 2} \sqrt{c_{i}} ε_{2 i} + 2 ρ u_{1 i}$
DGP7:
$ε_{1 i} \sim N (0, 1), ε_{2 i} \sim N (0, 1), ε_{1 i}$ and $ε_{2 i}$ being independent.
$c_{i} = 1 + e^{\frac{x_{2 i}}{\sqrt{3}}} (e^{- \frac{1}{2}} - 1) {(e^{1} - 1)}^{- \frac{1}{2}}$
$u_{1 i} = ε_{1 i} \sqrt{c_{i}}$
$u_{2 i} = σ {(1 + ρ^{2})}^{1 / 2} ε_{2 i} + σ ρ u_{1 i}$

DGP1 serves as a reference to assess the size of the pseudo-score LM test. The second DGP deviates from the bivariate normal in terms of a higher kurtosis, while DGP3 exhibits both higher skewness and kurtosis than the normal. DGP4 allows for deviation from normality in the outcome equation, while keeping the normality assumption in the selection equation. DGP5 reverses this pattern. The disturbances of the outcome equation are normal and those of the selection equation are not. DGP6 and DGP7 introduce heteroskedasticity in either the outcome or the selection equation, respectively. In case of the latter two, the variances of

u_{i 1}

and

u_{i 2}

is normalized to an average of 1 and

0.25

, respectively. Note, the explanatory variables are held fixed in repeated samples.

Overall, for these DGPs four experiments are considered. In the baseline Experiment 1 (first row of the figures of graphs)

37 %

of the data remain unobserved and in the absence of sample selection the implied

R^{2}

amounts to

1 - \frac{0.25}{1.75}

=

0.86

using

V a r (u_{2 i}) = 0.25

and

V a r (y_{2 i}^{*}) = 1.75

. Experiment 2 (second row of the figures of graphs) analyzed the performance of the Heckman two-step estimator under a weaker exclusion restriction, assuming

z_{1 i} \sim i i d U (- 1, 1)

so that

V a r (z_{1 i}) = 1 / 3

: Experiment 3 (third row of the figures of graphs) sets the constant of the outcome equation to zero so that

49 %

instead of

37 %

units are unobserved. Lastly, Experiments 4 (fourth row of the figures of graphs) considers a weaker fit in the outcome equation setting

V a r (u_{2 i}) = 1

so that in the absence of sample selection we have

R^{2} = 0.43 .

Table 1 summarizes the average variance, skewness and kurtosis of the generated disturbances

u_{1 i}

and

u_{2 i}

under Experiment 1. In DGP2-DGP7, depending on ρ, the average kurtosis of

u_{2 i}

varies between

3.00

and

5.68

, while the kurtosis of

u_{1 i}

lies between

3.07

and

3.58

in DGP5. In the other ones the kurtosis of

u_{1 i}

is held constant taking values

2.99

(DGPs 1,4 and 6),

3.97

(DGP2),

3.58

(DGP3) and

5.71

(DGP7), respectively. The skewness coefficient of the generated disturbances is zero for all DGPs except for DGP3 with corresponding values of 0.63 (

u_{1 i}

) and

- 0.21

to 0.43 (

u_{2 i}

) and DGP5 where the skewness of

u_{1 i}

varies between

0.14

and

0.63

.

Table 1. Variance, Skewness and Kurtosis of the simulated disturbances.

**Table 1.** Variance, Skewness and Kurtosis of the simulated disturbances.
DGP	ρ	$u_{1}$				$u_{2}$
DGP	ρ	Variance	Skewness	Kurtosis	Variance	Skewness	Kurtosis
1	all	1.00	0.00	2.99	0.25	0.00	2.99
2	−0.8	1.00	0.00	3.97	0.25	0.00	3.52
2	−0.4	1.00	0.00	3.97	0.25	0.00	3.70
2	0.0	1.00	0.00	3.97	0.25	0.00	3.96
2	0.4	1.00	0.00	3.97	0.25	0.00	3.70
2	0.8	1.00	0.00	3.97	0.25	0.00	3.52
3	−0.8	1.00	0.63	3.58	0.25	−0.21	3.29
3	−0.4	1.00	0.63	3.58	0.25	0.35	3.28
3	0.0	1.00	0.63	3.58	0.25	0.51	3.38
3	0.4	1.00	0.63	3.58	0.25	0.43	3.28
3	0.8	1.00	0.63	3.58	0.25	0.43	3.29
4	−0.8	1.00	0.00	2.99	0.25	0.11	3.05
4	−0.4	1.00	0.00	2.99	0.25	0.39	3.27
4	0.0	1.00	0.00	2.99	0.25	0.51	3.38
4	0.4	1.00	0.00	2.99	0.25	0.39	3.27
4	0.8	1.00	0.00	2.99	0.25	0.11	3.04
5	−0.8	1.00	0.63	3.58	0.25	0.00	2.99
5	−0.4	1.00	0.63	3.58	0.25	0.00	2.99
5	0.0	1.00	0.63	3.58	0.25	0.00	2.99
5	0.4	1.00	0.63	3.58	0.25	0.00	2.99
5	0.8	1.00	0.63	3.58	0.25	0.00	2.99
6	−0.8	1.00	0.00	2.99	0.25	0.00	3.34
6	−0.4	1.00	0.00	2.99	0.25	0.00	4.89
6	0.0	1.00	0.00	2.99	0.25	0.00	5.68
6	0.4	1.00	0.00	2.99	0.25	0.00	4.89
6	0.8	1.00	0.00	2.99	0.25	0.00	3.35
7	−0.8	0.99	0.00	5.71	0.25	0.00	4.11
7	−0.4	0.99	0.00	5.71	0.25	0.00	3.06
7	0.0	0.99	0.00	5.71	0.25	0.00	2.99
7	0.4	0.99	0.00	5.71	0.25	0.00	3.06
7	0.8	0.99	0.00	5.71	0.25	0.00	4.11

Following Davidson and MacKinnon (1998) [16] the size and power is analyzed in terms of size-discrepancy and power-size curves. The former is based on the empirical cumulative distribution function of the p-values,

p_{r}

, defined as

F (q) = \frac{1}{R} \sum_{r = 1}^{R} I (p_{r} \leq q)

, where R is the number of Monte Carlo replications. The size-discrepancy curves are defined as plots of

F (q) - q

against q under the assumption that

H_{0}

holds and DGP1 is the correct one. In addition, one can use a Kolmogorov and Smirnov test to see whether

F (q) - q

differs significantly from 0 (see Davidson and MacKinnon 1998, p. 11) [16]. The size-power curves plot power against size, i.e.,

F_{H_{1}} (q)

against

F_{H_{0}} (q) .

In both plots

q \in [0, 0.15]

and step size is

0.001 .

An important feature of this procedure is that it avoids size adjustments of the power curves if the tests reject too often under

H_{0}

.

Figure 1 exhibits the size-discrepancy plots for Experiments 1–4 and sample sizes

n = 500

,

1000, 2000

. The plots show that the pseudo-score LM test is properly sized for

ρ = - 0.4

and

ρ = 0.4

in all experiments, while it slightly over-rejects at

ρ = - 0.8

and

ρ = 0.8

, especially at a small sample size (

n = 500

). For example, at a nominal test size of

0.05

and a sample size of 1000 the size of LM test is too high by

0.012

percentage points at

|ρ| = 0.8

. For

ρ = - 0.4

and

ρ = 0.4

the size-discrepancy is within the Kolmogorov and Smirnov

5 %

confidence of bound p ±

0.0096

for p-values smaller than

0.1

. A similar result has also been mentioned in Montes-Rojas (2011) [9] in case of robust LM and

C (α)

tests. A weaker exclusion restriction, setting

V a r (z_{1 i}) = 1 / 3

in Experiment 2, increases the size-discrepancy at high absolute values of ρ (Experiment 2, row 2 of Figure 1), but hardly affects the size of the test at

|ρ| = 0.4

. The size-discrepancy remains in the confidence bounds at medium values of ρ. Increasing the share of unobserved values to 0.49 (Experiment 3, row 3 of Figure 1) hardly affects the size-discrepancy. Lastly, Experiment 4 (last row of Figure 1) shows that a weaker fit (

V a r (u_{2, i}) = 1

) does not result in a larger size distortion as compared to the baseline in the first row of Figure 1. As one would expect, a larger number of observations generally enhances the performance of the LM test (see the last column in Figure 1). However, the large sample approximation improves relatively slowly with sample size under a weak exclusion restriction at high absolute values of ρ (confer the second row of graphs in Figure 1).

Figure 2, Figure 3 and Figure 4 present the power-size plots of the pseudo-score LM test for the DGPs 2–3, 4–5 and 6–7, respectively. In general and in line with the literature, for all DGPs referring to the alternative hypothesis we observe lower power of the pseudo-score LM test at high absolute values of ρ, but especially so at

ρ = - 0.8

. If the distribution of the disturbances of the outcome equation exhibits both skewness and excess kurtosis (DGP3) the simulated power of the pseudo-score LM test is higher than that of a symmetric distribution with fatter tails than the normal distribution except for (

ρ = - 0.8

). Furthermore, for DGP3 the power is generally lower at

ρ = - 0.8

as compared to large positive values (

ρ = 0.8)

, which reflects differences in the skewness of the distribution of

u_{2 i}

with respect to ρ (confer Table 1).

Figure 3 illustrates the power of the pseudo-score LM test under non-normality in either the outcome (DGP4) or the selection equation (DGP5) but not in both. Under DGP4 the pseudo-score LM test exhibits high power at intermediate absolute values of ρ, while at high absolute values of ρ the power tends to be lower as the weight of

u_{1 i}

(that is assumed to be normal) is higher in the disturbances of the outcome equation. In case of DGP5 we see the reversed pattern. Deviations from normality are only detected in case of high absolute values of ρ. Actually, under DGP5 the test has no power at all at

ρ = 0

, since in this case there is no effect of the truncation of

u_{i 1}

and disturbances of the outcome equation are normal. This results can be found in all four considered Experiments.

Figure 4 presents the size-power plot of DGP6 and DGP7 and refers to heteroskedasticity. DGP6 allows for heteroskedasticity in the outcome equation and DGP7 in the selection equation. The power-size curves indicate that the pseudo-score LM test is also able to detect this type of deviation from the model assumptions as heteroskedasticity translates into pronounced excess kurtosis of the disturbances of the outcome equation. For DGP6 this is the case at medium to low values of

| ρ |

. DGP7 introduces heteroskedasticity in Probit selection model. In this case, the LM test exhibits power at high absolute vales of ρ, but has virtual no power at

ρ = - 0.4

and

0.4

. The reason is that the nominal kurtosis of

u_{2 i}

is hardly affected (amounting to 3.06, see Table 1) and the bias of the Mills’ ratio and the estimated coefficients of the outcome equation, especially that of the Mills’ ratio turn out small in comparison.

Figure 1. Size-discrepancy plot.

Figure 2. Size power plot, DGP1-DGP3, n = 1000.

Figure 3. Size power plot, DGP1, DGP4 and DGP 5, n = 1000.

Figure 4. Size power plot, DGP1, DGP6 and DGP7, n = 1000.

Comparing the first and second row of graphs in Figure 2, Figure 3 and Figure 4 indicates that there is not much power lost with the weaker exclusion restriction. A higher share of unobserved units tends to slightly reduce the power of the LM test as one would expect (see the graphs in row 3 vs. 1 in Figure 2, Figure 3 and Figure 4). Comparing the first and the last row in Figure 2, Figure 3 and Figure 4 indicates that a weaker fit (i.e.,

V a r (u_{2 i})

is increased from 0.25 to 1) does not result in a significant loss of power. Lastly, as expected a larger sample size improves the power of the pseudo-score LM test across the board.4

4. Conclusions

Using Meijer and Wansbeek’s (2007) [10] GMM-approach for two-step estimators of the Heckman sample selection model, this paper introduces a pseudo-score LM test to check the assumption of normality and homoskedasticity of the disturbances, a prerequisite for the consistency of this estimator. The GMM-based pseudo-score LM test is easy to calculate and similar to the widely used Jarque and Bera (1980) [11] LM test. Indeed, in the absence of sample selection it reverts to their LM test statistic. In particular, the test checks whether the third and fourth moments of the disturbances of the outcome equation of the Heckman model conform to those implied by the truncated normal distribution. Under

H_{0}

normal disturbances of the selection equation and the absence of heteroskedasticity in both the outcome and the selection equation are additionally required.

Monte Carlo simulations show good performance of the pseudo-score LM test for samples of size 1000 or larger and a powerful exclusion restriction. However, in line with other tests of the normality assumption of the Heckman sample selection model proposed in the literature the pseudo-score LM test tends to be oversized, although only slightly, if the correlation of the disturbances of the selection and the outcome equation is high in absolute value or if the exclusion restrictions are weak. Hence, this test can be recommended for sample sizes of 1000 or larger.

Acknowledgments

I am very grateful to Tom Wansbeek and two anonymous referees for detailed and constructive comments on an earlier draft. A Stata ado-file for this test is available at: http://homepage.uibk.ac.at/ c43236/publications.html.

Appendix

Deriving $E [{(τ v_{i} + ε_{i})}^{k} | d = 1] :$

Let

Z \sim N (0, 1)

and consider

μ_{k} (a_{i}) = E [Z^{k} | Z > a_{i}], k = 1, 2 . . . .

The derivation the moments of

Z^{k} | Z > a_{i}

uses the following recursive formula (Meijer and Wansbeek, 2007, p. 45) [10]:

\begin{matrix} μ_{0} (a_{i}) & = & 1 \\ μ_{1} (a_{i}) & = & λ_{i} \\ μ_{k} (a_{i}) & = & (k - 1) μ_{k - 2} (a_{i}) + a_{i}^{k - 1} λ_{i}, k \geq 2 \end{matrix}

Setting

a_{i} = - z_{i}^{'} γ

and abbreviating

μ_{k} (a_{i}) = μ_{k, i},

one obtains

ψ_{k, i} \equiv E [v_{i}^{k} | d_{i} = 1] = \sum_{j = 0}^{k} {(- 1)}^{j} (\binom{k}{j}) μ_{j, i} λ_{i}^{k - j}

and based on these results one can calculate the moments of

{(τ v_{i} + ε_{i})}^{k}

as

\begin{matrix} φ_{k, i} & \equiv & E [{(τ v_{i} + ε_{i})}^{k} | d = 1] = E [\sum_{j = 0}^{k} (\binom{k}{j}) ε_{i}^{k - j} {(τ v_{i})}^{j} | d = 1] \\ = & \sum_{j = 0}^{k} (\binom{k}{j}) μ_{ε, k - j} τ^{j} ψ_{j, i} \end{matrix}

where

μ_{ε, k - j} \equiv E [ε_{i}^{k}] .

Pseudo-score LM test:

Denoting the GMM-estimates under

H_{0}

by

\tilde{θ},

the pseudo-score LM test can be written as (see Hayashi, 2000, p.491–493 [17], Newey and West, 1987, p. 780 [13] and Hall, 2005, p. 162 [14]) : 5

L M = n {\bar{h}}^{'} (\tilde{θ}) Ψ_{n} {(\tilde{θ})}^{- 1} {\bar{G}}^{'} (\tilde{θ}) {({\bar{G}}^{'} (\tilde{θ}) Ψ_{n} {(\tilde{θ})}^{- 1} \bar{G} (\tilde{θ}))}^{- 1} \bar{G} (\tilde{θ}) Ψ_{n} {(\tilde{θ})}^{- 1} {\bar{h}}^{'} (\tilde{θ})

where

Ψ_{n} (\tilde{θ}) = E [\bar{Ψ} (\tilde{θ})]

is a consistent estimator of

Ψ_{0}

under

H_{0}

. Using the fact that

\bar{G} (\tilde{θ})

is invertible yields the LM test statistic as

L M = n {\bar{h}}^{'} (\tilde{θ}) Ψ_{n} {(\tilde{θ})}^{- 1} \bar{h} (\tilde{θ})

which can be further simplified using the partitioned inverse

L M = n {\bar{h}}_{2} {(\tilde{θ})}^{'} {(Ψ_{n, 22} (\tilde{θ}) - Ψ_{n, 21} (\tilde{θ}) Ψ_{n, 11}^{- 1} (\tilde{θ}) Ψ_{n, 12} (\tilde{θ}))}^{- 1} {\bar{h}}_{2} (\tilde{θ})

since

{\bar{h}}_{1} (\tilde{θ}) = 0 .

Variance of moments:

Under fairly general conditions (see Amemiya, 1985, Section 3.4) [18],

l i m_{n \to \infty} E [\bar{Ψ} (θ)] = p l i m_{n \to \infty} \bar{Ψ} (θ)

and in the formulas for the asymptotic covariance matrix, one can replace

\bar{Ψ} (θ)

by its expectation. Note

Ψ_{n} (θ_{0}) = E [\bar{Ψ} (θ_{0})]

can be estimated consistently in the usual way by

Ψ_{n} (\tilde{θ})

. To obtain the estimate

Ψ_{n} (\tilde{θ})

, we partition

Ψ_{n} (θ)

in accordance to

\bar{h} (θ) = {({\bar{h}}_{1} {(θ)}^{'}, {\bar{h}}_{2}^{'} (θ))}^{'}

as

Ψ_{n} (θ) = [\begin{matrix} Ψ_{n, 11} (θ) & Ψ_{n, 12} (θ) \\ Ψ_{n, 12} {(θ)}^{'} & Ψ_{n, 22} (θ) \end{matrix}]

Using

\begin{matrix} h_{1, i} (θ) h_{1, i} {(θ)}^{'} = \\ [\begin{matrix} (d_{i} - p_{i}) \frac{ϕ_{i} z_{i}}{p_{i} (1 - p_{i})} \\ d_{i} w_{i} (τ v_{i} + ε_{i}) \\ d_{i} [{(τ v_{i} + ε_{i})}^{2} - φ_{2, i}] \end{matrix}] [\begin{matrix} \frac{(d_{i} - p_{i}) ϕ_{i} z_{i}}{p_{i} (1 - p_{i})} & d_{i} w_{i}^{'} (τ v_{i} + ε_{i}) & d_{i} [{(τ v_{i} + ε_{i})}^{2} - φ_{2, i}] \end{matrix}] = \\ [\begin{matrix} {(d_{i} - p_{i})}^{2} {(\frac{ϕ_{i}}{p_{i} (1 - p_{i})})}^{2} z_{i} z_{i}^{'} & \frac{d_{i} (d_{i} - p_{i}) ϕ_{i}}{p_{i} (1 - p_{i})} z_{i} w_{i}^{'} (τ v_{i} + ε_{i}) & \frac{(d_{i} - p_{i}) ϕ_{i}}{p_{i} (1 - p_{i})} d_{i} [{(τ v_{i} + ε_{i})}^{2} - φ_{2, i}] z_{i} \\ * & d_{i} w_{i} w_{i}^{'} {(τ v_{i} + ε_{i})}^{2} & d_{i} [{(τ v_{i} + ε_{i})}^{3} - φ_{2, i} (τ v_{i} + ε_{i})] w_{i} \\ * & * & d_{i} {[{(τ v_{i} + ε_{i})}^{2} - φ_{2, i}]}^{2} \end{matrix}] \end{matrix}

one obtains for the off-diagonal elements:

\begin{matrix} \frac{1}{n} \sum_{i = 1}^{n} E [(d_{i} - p_{i}) \frac{ϕ_{i}}{p_{i} (1 - p_{i})} d_{i} (τ v_{i} + ε_{i}) z_{i} w_{i}^{'}] & = & 0 \\ \frac{1}{n} \sum_{i = 1}^{n} E [\frac{(d_{i} - p_{i}) ϕ_{i}}{p_{i} (1 - p_{i})} d_{i} [{(τ v_{i} + ε_{i})}^{2} - φ_{2, i}] z_{i} & = & 0 \\ \frac{1}{n} \sum_{i = 1}^{n} E [d_{i} [{(ε_{i} + τ v_{i})}^{3} - φ_{2, i} (τ v_{i} + ε_{i})] w_{i}] & = & \frac{1}{n} \sum_{i = 1}^{n} p_{i} φ_{3, i} w_{i} \end{matrix}

Some of the explanatory variables summarized in

w_{i}

may not be observed at

d_{i} = 0 .

However, one can use the reasoning in Meijer and Wansbeek (2007) [10] and establish

\begin{matrix} p l i m_{n \to \infty} \frac{1}{n} (W_{1}^{'} W_{1}) - lim_{n \to \infty} \frac{1}{n} W^{'} Π W & = & p l i m_{n \to \infty} \frac{1}{n} \sum_{i = 1}^{n} d_{i} w_{i} w_{i}^{'} - lim_{n \to \infty} \frac{1}{n} \sum_{i = 1}^{n} p_{i} w_{i} w_{i}^{'} = 0 \\ p l i m_{n \to \infty} \frac{1}{n} \sum_{i = 1}^{n} (d_{i} - p_{i}) φ_{k, i} w_{i} & = & 0, k = 1, . . ., 8 \\ p l i m_{n \to \infty} \frac{1}{n} \sum_{i = 1}^{n} (d_{i} - p_{i}) φ_{k, i} φ_{l, i} & = & 0, k, l = 1, . . ., 4 . \end{matrix}

Here,

Π = d i a g (p_{1}, . ., p_{n})

and

W_{1}

is derived from W by skipping all rows with

d_{i} = 0 .

Hence, one can use

Ψ_{n, 11} (θ) = \frac{1}{n} [\begin{matrix} Z^{'} V Z & 0 & 0 \\ * & W_{1}^{'} Σ_{1} W_{1} & \sum_{d_{i} = 1} w_{i} φ_{3, i} \\ * & * & \sum_{d_{i} = 1} (φ_{4, i} - φ_{2, i}^{2}) \end{matrix}]

where

V = d i a g (\frac{ϕ_{1}^{2}}{p_{1} (1 - p_{1})}, . ., \frac{ϕ_{n}^{2}}{p_{n} (1 - p_{n})}), W_{n \times k_{2}} = {(w_{1}, . . ., w_{n})}^{'},

and

Σ = d i a g (φ_{2, 1}, . . ., φ_{2, n}) .

Σ_{1}

is obtained from Σ by deleting all rows and columns referring to

d_{i} = 0

, and similarly

W_{1} .

Similar arguments yield at

ξ = κ = 0

Ψ_{n, 22} (θ) = \frac{1}{n} \sum_{i = 1}^{N} [\begin{matrix} p_{i} (φ_{6, i} - φ_{3, i}^{2}) & p_{i} (φ_{7, i} - φ_{3, i} φ_{4, i}) \\ p_{i} (φ_{7, i} - φ_{3, i} φ_{4, i}) & p_{i} (φ_{8, i} - φ_{4, i}^{2}) \end{matrix}]

and

Ψ_{n, 12} (θ) = \frac{1}{n} \sum_{i = 1}^{n} [\begin{matrix} 0 & 0 \\ p_{i} w_{i} φ_{4, i} & p_{i} w_{i} φ_{5, i} \\ p_{i} (φ_{5, i} - φ_{2, i} φ_{3, i}) & p_{i} (φ_{6, i} - φ_{4, i} φ_{2, i}) \end{matrix}]

Again, we can insert

d_{i}

fir

p_{i}

. Applying the formula for the partitioned inverse yields the simplification of the pseudo-score LM test statistic:

\begin{matrix} L M & = & n {\bar{h}}^{'} (\tilde{θ}) Ψ_{n} {(\tilde{θ})}^{- 1} {\bar{h}}^{'} (\tilde{θ}) \\ = & n [0, {\bar{h}}_{2} {(\tilde{θ})}^{'}] {[\begin{matrix} Ψ_{n, 11} (\tilde{θ}) & Ψ_{n, 12} (\tilde{θ}) \\ Ψ_{n, 21} (\tilde{θ}) & Ψ_{n, 22} (\tilde{θ}) \end{matrix}]}^{- 1} [\begin{matrix} 0 \\ {\bar{h}}_{2} (\tilde{θ}) \end{matrix}] \\ = & n {\bar{h}}_{2} {(\tilde{θ})}^{'} {(Ψ_{n, 22} (\tilde{θ}) - Ψ_{n, 21} (\tilde{θ}) Ψ_{n, 11} {(\tilde{θ})}^{- 1} Ψ_{n, 12} (\tilde{θ}))}^{- 1} {\bar{h}}_{2} (\tilde{θ}) \end{matrix}

which is asymptotically distributed as

χ^{2} (2)

under

H_{0} .

Conflicts of Interest

The author declares no conflict of interest.

References

S.T. Yen, and J. Rosinski. “On the marginal effects of variables in the log-transformed sample selection models.” Econ. Lett. 100 (2008): 4–8. [Google Scholar] [CrossRef]
K.E. Staub. “A causal interpretation of extensive and intensive margin effects in generalized Tobit models.” Rev. Econ. Stat. 96 (2014): 371–375. [Google Scholar] [CrossRef]
W.K. Newey. “Two-step series estimation of sample selection models.” Econom. J. 12 (2009): 217–229. [Google Scholar] [CrossRef]
C.L. Skeels, and F. Vella. “A Monte Carlo investigation of the sampling behavior of conditional moment tests in Tobit and Probit models.” J. Econom. 92 (1999): 275–294. [Google Scholar] [CrossRef]
D.M. Drukker. “Bootstrapping a conditional moments test for normality after Tobit estimation.” Stata J. 2 (2002): 125–139. [Google Scholar]
A.K. Bera, C.M. Jarque, and L.-F. Lee. “Testing the normality assumption in limited dependent variable models.” Int. Econ. Rev. 25 (1984): 563–578. [Google Scholar] [CrossRef]
L.-F. Lee. “Tests for the bivariate normal distribution in econometric models with selectivity.” Econometrica 52 (1984): 843–863. [Google Scholar] [CrossRef]
B. Van der Klaauw, and R.H. Koning. “Testing the normality assumption in the sample selection model with and application to travel demand.” J. Bus. Econ. Stat. 21 (1993): 31–42. [Google Scholar] [CrossRef]
G.V. Montes-Rojas. “Robust misspecification tests for the Heckman’s two-step estimator.” Econom. Rev. 30 (2011): 154–172. [Google Scholar] [CrossRef]
E. Meijer, and T. Wansbeek. “The sample selection model from a method of moments perspective.” Econom. Rev. 26 (2007): 25–51. [Google Scholar] [CrossRef]
C. Jarque, and A. Bera. “Efficient tests for normality, homoskedasticity and serial independence of regression residuals.” Econ. Lett. 6 (1980): 255–259. [Google Scholar] [CrossRef]
J.J. Heckman. “Sample selection bias as a specification error.” Econometrica 47 (1979): 153–161. [Google Scholar] [CrossRef]
W.K. Newey, and K.D. West. “Hypothesis testing with efficient method of moments estimation.” Int. Econ. Rev. 28 (1987): 777–787. [Google Scholar] [CrossRef]
A.R. Hall. Generalized Methods of Moments. Oxford, UK: Oxford University Press, 2005. [Google Scholar]
S.C. Ahn, and P. Schmidt. “A separability result for GMM estimation, with applications to GLS prediction and conditional Moment Tests.” Econom. Rev. 14 (1995): 19–34. [Google Scholar] [CrossRef]
R. Davidson, and J.G. MacKinnon. “Graphical methods for investigating the size and power of hypothesis tests.” Manch. Sch. 66 (1998): 1–26. [Google Scholar] [CrossRef]
F. Hayashi. Econometrics. Princeton, NJ, USA; Oxford, UK: Princeton University Press, 2000. [Google Scholar]
T. Amemiya. Advanced Econometrics. Harvard, UK: Harvard University Press, 1985. [Google Scholar]

^1.An example is the estimation of gravity models of bilateral trade flows with missing and/or zero trade. Here, the assumption of bivariate normality turns out important for deriving comparative static results with respect changes in the external and internal margin of trade following Yen and Rosinski (2008) [1] and Staub (2014) [2].
^2.There is also work available that proposes normality tests for the Tobit model (see Skeels and Vella, 1999 [4] and Drukker, 2002 [5]).
^3.Specifically, Montes-Rojas (2011)[9] mentions the case where $u_{1 i} \sim N (0, 1)$ , $u_{2 i} = τ u_{1 i} + ε_{i}$ and $u_{1 i}$ and $ε_{i}$ being independent, but $ε_{i}$ does not follow a normal distribution. $φ_{3, i}$ and $φ_{4, i}$ the moments are $E [ε_{i}^{k}]$ are left unrestricted and estimated from the residuals of the second-stage outcome equation.
^4.The corresponding figures for a larger sample size of n = 2000 are available upon request from the author.
^5.Newey and West (1987) [13] propose to use the unrestricted estimator $\bar{Ψ} (\hat{θ})$ , a route that is not followed here.

© 2014 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Pfaffermayr, M. A GMM-Based Test for Normal Disturbances of the Heckman Sample Selection Model. Econometrics 2014, 2, 151-168. https://doi.org/10.3390/econometrics2040151

AMA Style

Pfaffermayr M. A GMM-Based Test for Normal Disturbances of the Heckman Sample Selection Model. Econometrics. 2014; 2(4):151-168. https://doi.org/10.3390/econometrics2040151

Chicago/Turabian Style

Pfaffermayr, Michael. 2014. "A GMM-Based Test for Normal Disturbances of the Heckman Sample Selection Model" Econometrics 2, no. 4: 151-168. https://doi.org/10.3390/econometrics2040151

APA Style

Pfaffermayr, M. (2014). A GMM-Based Test for Normal Disturbances of the Heckman Sample Selection Model. Econometrics, 2(4), 151-168. https://doi.org/10.3390/econometrics2040151

Article Menu

A GMM-Based Test for Normal Disturbances of the Heckman Sample Selection Model

Abstract

1. Introduction

2. The GMM Based Pseudo-Score LM Test for Normality

3. Monte Carlo Simulation

4. Conclusions

Acknowledgments

Appendix

Deriving $E [{(τ v_{i} + ε_{i})}^{k} | d = 1] :$

Pseudo-score LM test:

Variance of moments:

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

A GMM-Based Test for Normal Disturbances of the Heckman Sample Selection Model

Abstract

1. Introduction

2. The GMM Based Pseudo-Score LM Test for Normality

3. Monte Carlo Simulation

4. Conclusions

Acknowledgments

Appendix

Deriving E τ v i + ε i k | d = 1 :

Pseudo-score LM test:

Variance of moments:

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Deriving $E [{(τ v_{i} + ε_{i})}^{k} | d = 1] :$