Sample selection models attempt to correct for non-randomly selected data in a two-model hierarchy where, on the first level, a binary selection equation determines whether a particular observation will be available for the second level (outcome equation). If the non-random selection mechanism induced by the selection equation is ignored, the coefficient estimates in the outcome equation may be severely biased. When the selection mechanism leads to many censored observations, few data are available for the estimation of the outcome equation parameters, giving rise to computational difficulties. In this context, the main reference is Greene (2008) who extends the results obtained by Manski and Lerman (1977), and develops an estimator which requires the knowledge of the true proportion of occurrences in the outcome equation. We develop a method that exploits the advantages of response-based sampling schemes in the context of binary response models with a sample selection, relaxing this assumption. Estimation is based on a weighted version of Heckman’s likelihood, where the weights take into account the sampling design. In a simulation study, we found that, for the outcome equation, the results obtained with our estimator are comparable to Greene’s in terms of mean square error. Moreover, in a real data application, it is preferable in terms of the percentage of correct predictions.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited