# A Concentrated, Nonlinear Information-Theoretic Estimator for the Sample Selection Model

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction and Basic Model

#### 1.1. The Basic Sample Selection Model

_{1h}= 1, and we observe the market value, ${y}_{2h}={y}_{2h}^{*}$. Otherwise, y

_{1h}= 0 and y

_{2h}= 0.

**x**):

**x**

_{1h}and

**x**

_{2h}are K

_{1}and K

_{2}-dimensional vectors, β

_{1}and β

_{2}are K

_{1}and K

_{2}-dimensional vectors of unknowns and “t” stands for “transpose”. This model can be expressed as

_{1}and β

_{2}. Typically the researcher is interested primarily in β

_{2}.

## 2. The Information-Theoretic Estimator

_{1}and γ

_{2}in ${y}_{1}^{*}={A}_{1}{\gamma}_{1}$ and ${y}_{2}^{*}={A}_{2}{\gamma}_{2}$, where the dependent variable is censored and where ${\gamma}_{1}=\left(\begin{array}{c}{\beta}_{1}\\ {\epsilon}_{1}\end{array}\right),\text{}{A}_{1}=[{X}_{1}\text{\hspace{0.17em}}I]$, ${\gamma}_{2}=\left(\begin{array}{c}{\beta}_{2}\\ {\epsilon}_{2}\end{array}\right)\text{and}{A}_{2}=[{X}_{2}\text{\hspace{0.17em}}I]$. We formulate the censored model (5)-(7) in the following way.

_{i,s}is an auxiliary closed, convex set used to model the a-priori constraints on the β’s. Similarly, the closed convex set C

_{i,n}is part of the specification of the “physical” nature of the noise and contains all possible realizations of ε . We view the coordinates ${\varsigma}_{i}\text{in}{C}_{i,s}$ and ${\nu}_{i}\text{in}{C}_{i,n}$ as values of random variables distributed according to some probability measure $d{P}_{i}({\xi}_{i})\equiv d{P}_{i}({\varsigma}_{i},{\nu}_{i})$ such that their expectations (E) are

_{s}is just a mathematical construct to transform the estimation problem into a variational problem. The Q

_{n}, however, could be viewed as the probability measure describing the statistical nature of the noise. The process of estimation of the noise involves a tilting of the prior measure.

_{i}, and the post-data (posteriors) P

_{i}. This is just the continuous version of the Kullback-Liebler information divergence measure, also known as relative entropy (see [11,12,13]). Since the data are naturally divided into observed and unobserved parts, we divide the data into two subsets: J and J

^{c}of {1,2,…,N}. Next, rewrite the data (3)-(4) ${y}_{1}^{*}={A}_{1}{\gamma}_{1}$ and ${y}_{2}^{*}={A}_{2}{\gamma}_{2}$ as

_{1}and B

_{2}correspond to the rows of the matrices A

_{i}(i=1, 2) labeled by the indices for which observations are available. For the indices in J the values y

_{2}are observed and ${y}_{2h}^{*}>{y}_{1h}^{*}$, whereas for the values in J

^{c}all we know is that ${\overline{y}}_{1h}^{*}>{\overline{y}}_{2h}^{*}$.

**Theorem**

**2.1.**

**Proof:**

## 3. Large Sample Properties

**Proposition**

**3.1.**

- a)
- ${\beta}_{iN}^{*}\stackrel{D}{\to}{\beta}_{i}$ as $N\to \infty $, for i=1, 2.
- b)
- $\sqrt{{\rm N}}\left({\beta}_{iN}^{*}-{\beta}_{i}\right)\stackrel{D}{\to}\text{}N\text{(}0\text{,}{V}_{i})$ as $N\to \infty $

## 4. Analytic Examples

#### 4.1. Normal Priors

#### 4.2. Gamma Priors

_{1}defined as ${A}_{1}=\left({X}_{1}\text{\hspace{0.17em}}I\right)=\left(\begin{array}{c}{B}_{1}\\ {\overline{B}}_{1}\end{array}\right)=\left(\begin{array}{c}{D}_{1}\\ {\overline{D}}_{1}\end{array}\text{\hspace{0.17em}}\begin{array}{c}{I}_{1}\\ {\overline{I}}_{1}\end{array}\right)$. Note that $\left(\begin{array}{c}{D}_{1}\\ {\overline{D}}_{1}\end{array}\right)$ splits X

_{1}and $\left(\begin{array}{c}{I}_{1}\\ {\overline{I}}_{1}\end{array}\right)$ splits the N × N identity matrix to match the splitting of X. The concentrated entropy function is

#### 4.3. Bernoulli Priors

## 5. Empirical Example

OLS | 2-Step | AP | IT-GME | IT-Normal | IT-Bernoulli | |
---|---|---|---|---|---|---|

Constant | 1.073 | 1.771 | NA | 1.038 | 1.049 | 1.068 |

Education | 0.055 | 0.043 | 0.044 | 0.054 | 0.056 | 0.055 |

Experience | 0.038 | 0.023 | 0.038 | 0.038 | 0.038 | 0.038 |

Experience Squared | –0.001 | –0.0005 | –0.001 | –0.001 | –0.001 | –0.001 |

Rural | 0.214 | 0.268 | 0.332 | 0.210 | 0.215 | 0.214 |

Central City | –0.170 | –0.091 | –0.171 | –0.186 | –0.166 | –0.169 |

Enrolled in School | –0.290 | –0.471 | –0.190 | –0.301 | –0.283 | –0.288 |

λ | –0.461 | |||||

R^{2} | 0.355 | 0.376 | NA | 0.343 | 0.355 | 0.354 |

MSPE | 0.157 | 0.135 | NA | 0.147 | 0.144 | 0.144 |

^{2}and Mean Squared Prediction Error (MSPE) for each model are presented as well. All IT estimators outperform the other estimators in terms of predicting selection [17]. The estimated return to education is about 5% across all estimation methods, but only statistically significantly different from 0 for the OLS and the IT estimators. Though, all estimators have estimated parameters of the same magnitude and sign, only the OLS and the three reported IT estimates are statistically significantly different from zero in most cases.

## 6. Conclusion

## Acknowledgement

## References and Notes

- Heckman, J. Sample selection bias as a specification error. Econometrica
**1979**, 47, 153–161. [Google Scholar] [CrossRef] - Manski, C.F. Semiparametric analysis of discrete response: Asymptotic properties of the maximum score estimator. J. Econom.
**1985**, 27, 313–333. [Google Scholar] [CrossRef] - Cosslett, S.R. Distribution-free maximum likelihood estimator of the binary choice model. Econometrica
**1981**, 51, 765–782. [Google Scholar] [CrossRef] - Han, A.K. Non-parametric analysis of a generalized regression model: The maximum rank correlation estimator. J. Econom.
**1987**, 35, 303–316. [Google Scholar] [CrossRef] - Ahn, H.; Powell, J.L. Semiparametric estimation of censored selection models with a nonparametric selection mechanism. J. Econom.
**1993**, 58, 3–29. [Google Scholar] [CrossRef] - Golan, A.; Moretti, E.; Perloff, J.M. A small sample estimation of the sample selection model. Econom. Rev.
**2004**, 23, 71–91. [Google Scholar] [CrossRef] - Golan, A.; Judge, G.; Miller, D. Maximum Entropy Econometrics: Robust Estimation with Limited Data; John Wiley & Sons: New York, NY, USA, 1996. [Google Scholar]
- Golan, A.; Judge, G.; Perloff, J.M. Recovering information from censored and ordered multinomial response data. J. Econom.
**1997**, 79, 23–51. [Google Scholar] [CrossRef] - Golan, A. An information theoretic approach for estimating nonlinear dynamic models. Nonlinear Dynamics Econom.
**2003**, 7, 2. [Google Scholar] [CrossRef] - Maddala, G.S. Limited-Dependent and Qualitative Variables in Econometrics; Cambridge University Press: Cambridge, UK, 1983. [Google Scholar]
- Kullback, S. Information Theory and Statistics; John Wiley & Sons: New York, NY, USA, 1959. [Google Scholar]
- Kullback, S.; Leibler, R.A. On information and sufficiency. Ann. Math. Statist.
**1951**, 22, 79–86. [Google Scholar] [CrossRef] - Gokhale, D.V.; Kullback, S. The Information in Contingency Tables; Marcel Dekker: New York, NY, USA, 1978. [Google Scholar]
- Golan, A.; Gzyl, H. An information theoretic estimator for the linear model. Working paper
**2008**. [Google Scholar] - Due to the small size of the data and the large proportion of censored observations, none of the maximum likelihood methods would converge with all standard software.
- For discussion of the data, detailed analyses and discussion of the different estimators as well as a detailed discussion of the AP [5] application, see GMP [6]. The 2-step and AP estimates are taken from that paper, Table 8.
- To keep the Table simple and since these specific results are not of interested here, they are not presented.

## Appendix

**Proof of Theorem 2.1.**

**Proof:**

_{1}dP

_{2}:

_{1}, P

_{2}), and the outer sup is over the

**η**’s in the region indicated within the $\{\cdot \}$. The basic idea here is to replace the inequalities appearing in problem (10) with equalities. Next, the dual-unconstrained model of this inner primal problem is the solution to

_{2i}> y

_{1i}. With this step, the equivalent dual model of the primal problem (A.1) is

© 2010 by the authors; licensee MDPI, Basel, Switzerland. This article is an Open Access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

## Share and Cite

**MDPI and ACS Style**

Golan, A.; Gzyl, H.
A Concentrated, Nonlinear Information-Theoretic Estimator for the Sample Selection Model. *Entropy* **2010**, *12*, 1569-1580.
https://doi.org/10.3390/e12061569

**AMA Style**

Golan A, Gzyl H.
A Concentrated, Nonlinear Information-Theoretic Estimator for the Sample Selection Model. *Entropy*. 2010; 12(6):1569-1580.
https://doi.org/10.3390/e12061569

**Chicago/Turabian Style**

Golan, Amos, and Henryk Gzyl.
2010. "A Concentrated, Nonlinear Information-Theoretic Estimator for the Sample Selection Model" *Entropy* 12, no. 6: 1569-1580.
https://doi.org/10.3390/e12061569