A Semi-Parametric Approach to the Oaxaca – Blinder Decomposition with Continuous Group Variable and Self-Selection

This paper presents an extension to the Oaxaca–Blinder decomposition with continuous groups using a semiparametric approach known as varying coefficients model. To account for potential self-selection into the continuum of groups, the use of inverse mills ratios is expanded upon following the literature on endogenous selection. The flexibility of this methodology may allow detecting heterogeneity when analyzing endogenous dose treatments effects, as well as correcting for endogeneity when analyzing the heterogeneous partial effects across the continuous group variable. For illustration, the methodology is used to revisit the impact of body weight on wages, using body mass index (BMI) as the continuum of groups, finding evidence that body weight has a negative, but decreasing impact on wages for both white men and women.


INTRODUCTION
Since the seminal papers from Blinder (1973) and Oaxaca (1973), many studies have used what is known as the Oaxaca-Blinder (OB) decomposition for analyzing outcomes differences between two well-defined groups.Such differences are characterized as functions of differences in characteristics (composition effect) and differences in coefficients associated with those characteristics (wage structure effect).Subsequent research provided refinements that extended the OB decomposition analysis to nonlinear functions and distributional statistics other than the mean, as well as strategies to identify the model when some of the underlying assumptions do not hold (see Fortin, Lemieux, and Firpo [2011] for a review of other methodological extensions).
While the OB decomposition can be directly applied to scenarios with naturally discreet groups (i.e., union and nonunion workers, men and women, whites and nonwhites), the application of OB-type decompositions in cases with a continuum of comparison groups is not standard.Ñopo (2008) and Ulrick (2012) have proposed extensions to the standard OB decomposition allowing for a continuous group variable, using ad hoc parametric approximations.Neither strategy, however, deals with the scenario where the assumption of conditional independence does not hold, as is the case when there is self-selection of individuals into groups based on unobservables (endogenous membership).
The purpose of this paper is to propose a strategy to extend the OB decomposition to a continuous group variable using a semiparamatric approach known as varying coefficient models (Hastie and Tabshiran 1993).To account for endogenous selection, I abstract from a generalization of the Heckman selection model (Heckman 1979;Lee 1978;Li and Racine 2007;Vella 1998).This strategy can be useful for analyzing heterogeneous dose-treatment effects when endogeneity in terms of self-selection is expected.For example, in the context of labor market outcomes, the methodology can be used for analyzing the impact of smoking and smoking intensity on wages (Hotchkiss and Pitts 2013), obesity and body mass index (BMI) on wages (Cawley 2004), or training duration on employment probabilities (Kluve et al. 2011).
The rest of the paper is structured as follows.Section 2 describes the basic OB decomposition analysis in the presence of self-selection/endogenous membership.Section 3 introduces the use of a generalized selection term, here called the generalized inverse Mills ratio (GIMR), when individuals self-select into more than two ordered groups.Section 4 describes the use of varying coefficient models in the implementation of an OB-type decomposition.Section 5 provides an example of the implementation of the methodology by revisiting the wage penalty of obesity based on the research of Cawley (2004).Section 6 concludes.

THE OB DECOMPOSITION WITH SELECTION: BASICS
In the standard OB approach, the goal is to analyze how differences in observed characteristics and returns to these characteristics contribute to the average differences in the outcomes between two groups.For the appropriate identification of the OB decomposition, the strategy requires that the potential outcomes can be estimated using two well-specified linear models with exogenous membership into each group.This ensures that the distribution of the errors is orthogonal to the group membership.
In many instances, however, the assumption of membership exogeneity is likely to be violated if individuals self-select to be part of a specific group (i.e., part of the treated group). 1 When this happens, the conditional distribution of the errors is no longer independent of the group membership, ruling out the identification strategy of the standard decomposition approach.
As described in Heckman (1979), this endogenous selection can be considered an omitted variable problem that can be corrected by modeling the selection process and using this information to add a correction term in the model specification. 2 This strategy requires the estimation of a three-equation model that is described as follows: 1 Fortin, Lemiux, and Firpo (2011) provide other scenarios where the conditional independence assumption might be violated. 2This strategy has been used in the framework of the OB decomposition in terms of a switching regression model with unknown selection.See, for example, Lee (1978).Where * is the latent propensity of an individual (i) to be part of group B, and is a vector of variables related to individuals' membership that may include variables not included in X. 3 If we assume that ( , , , , ) are distributed jointly normal: , , , , ~ 0 0 0 , .
. 1 (2) the model can be estimated using a full information maximum likelihood (FIML) or a two-step procedure (heckit).The latter involves including estimates for the inverse Mills ratio (IMR), also known as the selection correction term, in the main outcome model based on the information coming from the selection equation.In specific, for this setup, the IMR ( would be defined as follows: where .stands for the normal density function and Φ .for the normal cumulative density function. The parameters ( ) can be obtained by estimating equation (1c) using a probit model, while unbiased estimations for equations (1a) and (1b) can be obtained using ordinary least squares (OLS) by including the corresponding IMR as explanatory variables: In this setting, an estimation of the adjusted outcome gap after controlling for selection can be written as follows: and can be used to implement any of the standard OB decompositions based on assumptions of the counterfactual wage structure. 4As described in Fortin, Lemieux, and Firpo (2011), outcome differences accounted for by differences in the coefficients (structure effect) can be interpreted as the treatment effect of membership, after adjusting for differences in observed characteristics and endogenous selection.

GENERALIZED SAMPLE SELECTION
In the model described above, we assume that the only information known about the selection process is that individuals are members of one of two groups (A or B).As discussed in Vella (1998), the grouping variable may contain additional information, such as intensity, that can be used to obtain a better approximation of the selection correction term, even if the interest remains in analyzing differences between two groups.
As before, consider a model where we observe the continuous characteristic ( ) for each individual, which can be used to broadly classify them into groups A and B (dichotomization of the groups).This characteristic could be the number of hours worked per week, number of cigarettes smoked in a month, or weeks of training before reentering into the labor force, among others.The selection process and outcome equations can be described as follows: 4 For example, assuming counterfactual wages are given by the wage structure observed in group B, the components of the decomposition would be given by ∆ ̅ ̅ ̅ , where ̅ can be interpreted as a treatment effect under the conditional independence assumption.Many authors have proposed various alternatives for the estimation of these types of selection models, using both parametric and semiparametric strategies (see Li and Racine [2007, sec 10.3] and Vella [1998]).In general, following the approach proposed by Heckman (1979), these methodologies suggest that to obtain consistent estimators for the parameters ( ), one should include an approximation of the selection bias term as a control in the main regression model.In this paper I concentrate on three methodologies that assume the overall distribution of D is observed, with extensions to scenarios where D is partially observed.Vella (1998) discusses the estimation of models such as the one described above and suggests that a feasible strategy is to estimate the selection process (equation [7c]) as a tobit model if D has a censored distribution.Without loss of generality, assuming D is censored at zero, the selection correction term or IMR is defined as: These are often called generalized residuals.It should be noticed that when D is not censored, equation (7c) can be estimated using standard OLS and the IMRs are simply the OLS residuals.
Including the residuals to the main models would be equivalent to the control function for endogeneity (Wooldridge 2015).
As Vella (1998) and Li and Racine (2007) describe, using this correction term provides estimations that are more stable and efficient than using the standard IMR (which assumes dichotomous grouping).However, similar to the analysis of endogenous variables, an instrumental variable is required to identify the coefficients of the selection correction term and the treatment intensity (D).
An alternative method described in Vella (1998) is one where the selection process corresponds to a setting with discreet but ordered selection rules.If we assume that is a discretized transformation or classification of (i.e., ∈ , ), and that * is the latent propensity of an individual to be part of group , then the selection equation process can be written as: Note that equation (9b) is a different way of writing the selection model described in Vella (1998), where all coefficients in are permitted to vary.Also note that all latent coefficients are affected by the same shock ( ).Under the parallel lines assumption (Williams 2016), an ordered probit model (O-probit) can be used to estimate this model, where only the constant is allowed to vary across models.
Similar to the binary-group case, the outcome equations can be consistently estimated using OLS by simply including a selection correction term, which for the selection rule described by equations ( 9a) and (9b) takes the form: where * is the GIMR (Vella 1998).Here the term with the estimated latent index, and the estimator will be poorly identified.
As described in Chernozhukov, Fernandez-Val, and Melly (2013), there are more flexible alternatives for the estimation of the selection model, by allowing all parameters in to vary with D and by estimating all possible models for each threshold in D. This can be done using independent models (Foresi and Perachi 1995), or using simultaneous models such as the generalized O-probit model (Terza 1985).Both alternatives, however, impose great computational burden and may produce unrealistic predicted probabilities in the model as the number of groups (J) increase. 6aking from the literature on distributional regressions (Chernozhukov, Fernandez-Val, and Melly 2013), the last alternative suggested here is to use global distributional regressions to characterize the cumulative distribution of the outcome | .This can be done using a fractional probit model that takes the form: Empirically, this model can be estimated by substituting | with the sample estimator of the unconditional cumulative distribution ∑1 , or some other approximation of it. 7In this case, the corresponding GIMR takes the form: Once the corresponding selection correction terms have been estimated, and the average wage gap corrected for the selection term, the OB decomposition can be implemented in the standard way, using equation ( 6).In this framework, the structure effect can be interpreted as the average treatment between the untreated and treated group.

Varying Coefficients Model and Heterogeneity of the Treatment Effect
The previous section described the construction of sample selection correction terms that use the information on the intensity of the treatment/selection variable to obtain the GIMR, which can be used to implement an OB decomposition comparing any two groups.A simple generalization of the OB structure that accounts for a continuum of groups can be written as: (13) where is a vector of parameters that vary with the grouping variable (D).Without loss of generality, including the GIMR term into the model to obtain unbiased estimates through OLS provides a model that can be written as: * where is a vector that includes the constant and explanatory variables, and * is the estimate of the GIMR for person .8 In principle, as stated in Ulrick (2012), with enough information it is possible to estimate all the parameters in the above equation for any value of by simply estimating models with constrained data.However, in most applications, the number of observations with a specific value for may be insufficient to provide an appropriate estimation of coefficients , .Borrowing from the literature on nonparametric econometrics, feasible estimations can be obtained for the parameters using an extension of local regression estimations, known as varying coefficient models (Hastie and Tibshirani 1993;Li and Racine 2007). 9Using this strategy, one imposes no restrictions on the coefficients other than them being smooth and differentiable in .
This method expands on the use of kernel local smoothing regressions, allowing for a flexible parameterization of the outcome model in equation ( 14 which is equivalent to minimizing the weighted squares errors of the model, with weights given by the kernel function, ., and the bandwidth, .As discussed in Hastie and Tibshirani (1993), to reduce problems with boundary bias, the recommendation is to use a local constant approximation for ≅ B B .The constant component of these coefficients, , , represents the local effect that a variable (X) has on the outcome (y) in the neighborhood of .This can be used to implement the OB decomposition for the selectivity-corrected outcome between any two particular groups, depending on assumptions regarding the reference group (Fortin, Lemieux, and Firpo 2011).

Bandwidth Selection and Standard Errors
An important aspect of the estimation of varying coefficients is the choice of bandwidth .
Larger bandwidths help reduce the variance of the estimated parameters, but increase the bias.In contrast, smaller bandwidths can reduce the bias at a cost of higher variance. 10While there are a few suggestions in the literature regarding to the choice of bandwidths (see, for example, Hoover et al. [1998]), a leave-one-out crossvalidation procedure using a single smoothing parameter ( for smoothing all explanatory variables is used here.This implies choosing so that it minimizes the following expression: where and are the leave-one-out estimated coefficients for a given bandwidth (h) and a point of interest (D); is a weight function that serves to avoid difficulties of slow convergence caused by the sparse distribution of D. Because the bandwidth does not affect the calculation of the GIMR, the parameter * will be considered exogenous for the estimation of the crossvalidation criteria.
In the present context, the analytical estimation of the standard error of varying coefficient models with selection can be considerably cumbersome to implement.Under the assumption that the selection term is fixed and exogenous, Li and Racine (2007) provide expressions for the asymptotic distribution of the standard errors for the local linear estimator of varying coefficient models. 11However, because the model described above is based on a two-step estimation process, the estimation of the standard errors needs additional adjustments (Heckman 1979).
Because of the added complexity, a more feasible method, albeit computationally intensive, is using pair-wise bootstrapped standard errors.The benefits of this strategy have been discussed in Yatchew (2003) and Keele (2008), and, more recently, its application has been formally discussed in Cattaneo and Jansson (2018) in the framework of kernel regressions.The procedure can be described as follows: Step 1. Obtain a random paired bootstrap sample ( from the original sample.
Step 2. Estimate the selection correction term * using any of the methods presented in section 3.
Step 3. Estimate the coefficient for the outcome models for all points of interest d, based on the bootstrap sample using local kernel regressions.
Step 4. Estimate the decomposition components for the group(s) of interest.
Step 5. Repeat steps 1 through 4 B-times to obtain the empirical distributions' aggregated and detailed decomposition components.
In the next section, an application of this semiparametric strategy is presented, revising the main results from Cawley (2004), where BMI will be used as the continuum of groups for the decomposition analysis.

APPLICATION: REVISING THE IMPACT OF OBESITY ON WAGES
Several studies have found that body weight is negatively correlated to wages, in particular for white women (Cawley 2004;Sabia and Rees 2012;Averett 2011;Fikkan and Rothblum 2012).
The most common explanations for this negative correlation are: obesity lowers wages by reducing productivity and increasing discrimination; low wages may cause obesity due to unhealthy eating habits caused by lower income; or that unobserved factors simultaneously cause 11 See Li and Racine (2007, sec 9.3.2) for further details.
higher body weights and lower wages.In his review of the literature, Cawley (2004) criticizes the robustness of various strategies that have been followed in the literature to analyze the relationship between body weight and wages, and suggests the application of an instrumental variable approach to better capture the causal relationship between BMI and wages.
Using data from the National Longitudinal Survey of the Youth (NLSY) for the years 1981 to 2000, Cawley (2004) provides estimations for the impact of BMI and weight on wages, using siblings' BMI, sex, and age as instruments for own BMI. 12 Correcting for reporting errors on weight and height, the evidence of his preferred model suggests that the negative effect of higher BMI on wages is only statistically significant for white women, with no statistically significant effect for other groups.
For the illustration of the proposed methodology, BMI will be considered the continuous group variable that is used to analyze the wage gaps in relation to body weight.Due to the higher demands that the methodology imposes on the data, some changes on the data definitions and model specifications are introduced.To compare the results with the instrumental variable approach used in Cawley (2004), I first replicate the original results and present various estimations showing how sensitive the results are to changes in the variables' definitions and model specifications.Second, I briefly describe the specific OB decomposition approach used for the present example, given that no natural comparison group exists.Finally, I provide estimations of the semiparametric decomposition approach under the preferred assumptions.

Replication and Variable Definition Changes
Cawley ( 2004) estimates instrumental variable models for six different demographic groups based on gender and race, using measures for BMI that are corrected for self-reporting error13 as the main explanatory variable, and using siblings' BMI, age, and sex as instrumental variables.
Making use of clustered standard errors at the individual level and using sampling weights, he reports that BMI has a negative impact on wages for all groups and races, but is only statistically significant for white women.Replications of these results are provided in table 1, pooling together blacks and Hispanics into nonwhites.According to this result, an increase in BMI of one point would translate into a 1.5 percent reduction in wages for white women, with no statistical impact for other groups.
Because of the multiple steps involved in the semiparametric methodology proposed here, the original model specification required some adjustments. 14First, sampling weights are excluded from the analysis, so that clustered bootstraped standard errors can be applied directly.Second, in the original replication files, Cawley (2004)  Reestimating the results using the same specifications used in Cawley (2004) incorporating the changes described above, shows that the conclusions are robust to the model and sample specification changes, with small changes in the point estimates (see table 1).From here forward, the replication will focus on the estimates for white males and females only, since the results are small and not statistically significant for nonwhite groups.
14 See the appendix for a complete set of results and intermediate steps for the data and model specification changes.Since the methodology proposed here uses various options for the estimation of the GIMR (a control function approach), I test the sensitivity of the results from Cawley (2004) in the restricted sample by reestimating the model including the GIMR in the specification.This is equivalent to adjusting for endogeneity using a control function approach (Wooldridge 2015).
For the estimation of the O-probit model, two options for dependent variables are used: one that categorizes BMI data in 10 groups of equal size (OP1), and one that categorizes BMI data in 10 groups with the same range (OP2).For the fractional probit model (FP), the empirical cumulative distribution of BMI is used as the dependent variable.In addition to using the siblings' data as instruments, second-order interactions are also included as instruments to account for further nonlinear effects.The results with clustered bootstrapped standard errors are shown in table 2. As expected, the results using the OLS residuals are identical to the standard instrumental variable approach, showing marginal changes when interactions are added as instruments.For the rest of the models, however, the results are somewhat different.Using alternative methods for the estimation of the GIMR shows that the impact of BMI is statistically significant for both white men and women at conventional levels.Furthermore, the impact of BMI is slightly larger for men, while it declines somewhat for women.It will be shown later that BMI does have a negative and significant impact on wages for men, but only over a restricted segment of the BMI distribution.For the rest of the paper, linear and quadratic terms of the instrumental variables will be used to account for nonlinear effects and identification, and decomposition will be implemented using OLS GIMR.

Oaxaca Decomposition Approach and Implementation
In order to implement an OB decomposition in the present framework, it is necessary to define an appropriate comparison/baseline/untreated group to analyze wage gaps across BMI.A common approach is to use individuals with a "healthy" BMI level as the baseline group, and compare the results against all other groups (over-and underweight).Following this premise, people with a BMI considered healthy (between 18.5 and 25) are used as the comparison group.
They represent approximately 48 percent of white men and 62 percent of white women.Using this reference group, the OB decomposition is obtained by estimating the following equations: The first equation is estimated using a sample of the comparison group only, whereas the second is estimating using kernel local linear regressions, as described in section 4.1.Notice that both equations include the GIMR variable to adjust for sample selection, and that BMI is also included in equation (18a) to control for any impact it may have on wages even within the healthy weight group. 15This variable is centered at the mean, so it uses the average BMI as the reference point for estimating the constant.
For the implementation of the OB decomposition, a threefold decomposition is used on the selectivity-corrected wage gap using the following formulas: Wage structure effect: where is the local linear predicted mean of the variable , | is the average characteristics for people with a healthy BMI, and and are the coefficients corresponding to the comparison group and for people with a BMI around d.
As described in section 3.2, the bandwidth for the kernel regressions is selected separately for white men and white women using a crossvalidation procedure.The specification in equation (18b) and the OLS GIMR are used as a benchmark for the estimation of the optimal bandwidths, which are used for all models, even if the GIMR is estimated through other methods.To reduce the impact of sparse areas in the distribution of BMI on the bandwidth selection, two approaches were taken.The first is to set 0 for observations that fall in the top and bottom 1 percent of the distribution.The second is to use a strictly monotonic transformation of BMIspecifically the cumulative distribution G(BMI)-as the running variable for the local linear regressions, avoiding the sparse distribution problem. 16Using this transformation is similar to using of a varying bandwidth, since more information will be used in areas that are more sparsely distributed than others, but can also be compared to the use of k-nearest-neighbors estimators.All models are estimated using Gaussian kernel functions.Table 3 provides the optimal bandwidths obtained from the crossvalidation criteria for both men and women in the sample.

Aggregate Decomposition Results
Figure 1 plots the total selectivity-corrected wage gap across the BMI for men and women, comparing people at all points of the BMI distribution with those in the comparison group.The panels on the right provide the estimates that use the original BMI variable for the semiparametric regression, while the panels on the left show the estimates using the transformed variable, G(BMI).The darker and lighter regions show the 90 percent and 95 percent confidence intervals constructed using a clustered bootstrap procedure with 1,000 repetitions.For men and women, the displayed gaps are provided for the relevant range of BMI, which excludes the top and bottom 1 percent of the distribution.
According to the estimations, the selectivity-corrected wage gap for men and women exhibits an inverse-U shape with respect to their BMI.For women, I estimate a negative but not statistically significant wage gap for all points of the BMI distribution.Based on the semiparametric estimation that relies on the transformed BMI data, women at the top of the BMI distribution 16 In principle, this transformation should have no effect on the estimation of the semiparametric model.If , and is a strictly monotone transformation, then | | .
earn, on average, 8 percent less than a woman with an average BMI, which is significant only at the 10 percent level.The results based on kernel regressions with the original distribution of BMI provide qualitatively similar results but with estimations with lower precision, especially when looking at women with high and low BMI scores.
In the case of men, the results suggest those with a BMI above 23 exhibit a positive and statistically significant wage gap compared to the average.The largest positive gap (17 percent) observed for men with a BMI around 27, but this declines steadily for men with higher BMI, and turns statistically not significant for men with a BMI above 31.Men with a BMI below 22 show a negative wage gap as large as 32 percent (based on the original variable distribution).
Similar to the results for women, the estimates for men at the top of the BMI distribution are less precise when using the original BMI for the semiparametric regression.Because the results using the transformed variable are more precise than the alternative, the rest of the paper will center on these estimations alone.17Similar to the standard OB analysis, the total wage gap reported in figure 1 is not an adequate measure of the wage gap driven by differences in BMI because it is driven by differences in characteristics (composition effect), coefficients (wage structure effect), or a combination of both.In figure 2, I provide the semiparametric estimations for these three components for men and women, using the kernel regression based on the transformed data and using the OLS GIMR.According to the estimations, the composition (or endowment) effect has a large and statistically significant impact when explaining the wage gaps based on BMI.Its magnitude, which is larger for men than women, shows a monotonically increasing trend with respect to BMI, but at a decreasing rate.Across the distribution of BMI, differences in characteristics explain a wage gap that ranges between -20 percent to 21 percent for men and -12.6 percent to 13.7 percent for women when looking at people with BMI of 18 and 40, and compared to people with a healthy BMI.This implies that white men and women with higher BMI have in average better endowments, which translates into higher wages.
The most important component of the decomposition is the wage structure effect.This effect can be interpreted as the treatment effect of BMI on wages after controlling for observed characteristics and endogenous selection.The first thing to notice, consistent with Cawley (2004), is that the wage structure effect for women shows a monotonically decreasing trend with respect to BMI across the whole distribution.However, the results suggest that BMI has a negative and nonlinear impact on wages.The estimations show that there is a steady decline in the wage structure component among women with a BMI between 18 to 30, with a wage gap that goes from 6.3 percent for women with a BMI of 18, to -12 percent for women with a BMI of 30.
In comparison, only marginal changes in the wage gaps are observed above and below these thresholds.
For men, the effect of BMI on wages shows a different pattern.On the one hand, the results are less precise and no statistically significant differences across BMI levels are observed.Setting aside the low precision of the estimates, the wage structure effect for men shows an inverse-u shape with respect to BMI.Compared to men with a BMI of 25, for whom a point estimate of 2.2 percent wage gap is estimated, the wage premium declines at lower and higher ends of the BMI distribution.Men at the top of the BMI distribution are estimated to have a wage gap of -15 percent, while men at the bottom face a wage gap of -3 percent.This may explain why the instrumental variable estimates for men (see table 1 and 2) are negative but not statistically significant.
The last component of the decomposition is the interaction effect, which accounts for the fact that average wages are different because both coefficients and characteristics differ across groups.For men and women, the interaction effect grows negative with a higher BMI.In the case of men, the interaction effect is never statistically significant, whereas for women it is statistically significant at conventional levels and accounts for up to -9 percentage points of the total wage gap for women with high BMI.

Revisiting Cawley (2004): Partial Effect of BMI on Wages
One of the conclusions in Cawley ( 2004) is that a one standard-deviation increase in body weight (roughly 32 pounds), or equivalently a 5.5 point increase in BMI, is associated with a 9 percent drop in wages.18This is a linear extrapolation of the estimates of Cawley's preferred model, which suggests that a one-point increase in BMI is associated with a wage reduction of 1.7 percent.
While the results provided above cannot be directly compared to these findings, the delta method can be used to obtain partial effects that can be directly compared to Cawley's results.Figure 3 provides the estimations of the change of the wage structure effect as a function of BMI, and compares them to the effect based on the instrumental variable approach. 19s described in table 2, the instrumental variable estimations suggested that BMI has a negative impact on wages, where a one-point increase in BMI is associated with 1.5 percent lower wages for women and 1.2 percent lower wages (not statistically significant) for men.Looking at the partial effects estimated with the semiparametric OB decomposition (Figure 3) suggests that the effect is negative, nonlinear, and statistically significant for men and women.
The marginal effect of BMI on the wage structure for women with a BMI between 20 to 25 is larger than that based on the linear instrumental variable estimate.The largest estimated partial effect indicates that an increase in BMI of one point for a woman with a starting BMI score of 22.5 relates to a wage decline of 2.5 percent-an almost 65 percent greater effect than the instrumental variable estimate of 1.5 percent.The negative impact of a higher BMI is not statistically significant for women with a BMI below 20 or above 29, and the impact is below 0.5 percent for women with a BMI below 18 or above 30.Men with a BMI below 25 seem to enjoy a small positive wage gain associated with increasing BMI, although it is not statistically significant.The wage penalty due to a higher BMI is statistically significant above 27, with the largest wage decline measured at 2.3 percent (at a BMI of 29.5), almost twice as large as the instrumental variable estimates.While the partial effect on wages decrease as BMI increases, it remains statistically significant through the rest of the BMI distribution.

CONCLUSIONS
In this paper, I have presented a methodology for the implementation of OB decomposition when the grouping variable is continuous in the presence of potential endogenous selection into groups.This methodology uses a semiparametric approach, known as varying coefficient models (Hastie and Tabshiran 1993), which has the advantage of providing a more flexible specification on the parameterization of the coefficients.The use of the GIMR, also known as generalized residuals, allows for a feasible strategy to control for endogenous selection based on the Log wage gap/ BMI continuous grouping variable.This methodology may prove useful for the analysis of endogenous treatment effects with varying treatment intensity, especially when heterogeneous effects are present.
In the application example, I revise the results from Cawley (2004) to evaluate the causal effect of BMI on wages.Using BMI as the endogenous but continuous grouping variable, I apply the proposed methodology, using siblings' BMI, age, and sex and their interactions as instruments for the estimation of the selection correction terms (GIMR) that should correct for the endogeneity of body weight and BMI.Similar to Cawley (2004), the application of the strategy does not account for possible self-selection into the labor force driven by body weight.
The application of the semiparametric OB decomposition shows that the association between BMI and wages is nonlinear, and that the negative impact of BMI on wages may be larger for women than that described in Cawley's (2004) original paper for women with a healthy BMI, but much smaller for women at the top and bottom of the BMI distribution.Furthermore, it showed that for men, BMI also has a statistically significant and negative association with wages, which was not captured previously because of the weak but positive impact that BMI has on wages for men with a low BMI.

,
, , following a joint normal distribution as defined before, with some arbitrary threshold to define membership, and with equation (7c) representing the equation (or equations) that describe the data-selection process.It is easy to see that this model reverts to the standard switching regression model if a dichotomous transformation 1 is used for equation (7c).
it can be considered as the expected value of the correction term for all values of within the group .However, if more detailed groups are created and larger samples are available, one should expect used in the selection equation model, the GIMR will be strongly linear ), modeling the conditional mean | as a linear function of explanatory variables and a selection term in the neighbor of .This would, in principle, allow us to obtain estimates of the coefficients for every point of interest (d): * and the function representing the conditional mean of any variable Z in the neighborhood d.This model can be estimated by minimizing the objective function:

Figure 1 .
Figure 1.Selectivity Corrected Wage Gap over BMI by Gender

Figure 3 .
Figure 3. Partial Effect of BMI on the Wage Structure Effect

Figure
Figure A1.Kernel Densities of BMI across Race and Sex

Figure
Figure A4.Aggregate Semiparametric Decomposition with Kernel Regression Using G(BMI)Sensitivity to GIMR Estimation Method kept sample observations with missing data in the model specification.He did so by replacing missing values with zeros and adding dummy variables indicating if a variable has missing observations.To reduce the number of explanatory variables in the model, data with missing information in the general intelligence score, highest grade attained, job tenure, and county employment rate are excluded from the sample so that the dummy indicators are dropped from the specification as well.Instead of including both father's and mother's highest degree of education, both variables are combined to a single variable (parents' highest degree of education).Observations with missing data on both parents are also excluded from the sample.Finally, observations with a BMI below 14 and above 60 are also excluded from the sample.This reduces the total sample from 44,026 observations to 40,087 observations.

Table 2 . Replication with Restricted Data: Control Function Approach
Clustered bootstrapped standard errors at the individual level in parenthesis using 250 repetitions.OP1 uses a categorical variable that divides the sample in 10 groups of equal size; OP2 uses a categorical variable that divides the sample in 10 groups of equal range.* p<0.1, ** p<0.05, *** p<0.01

Table 3 Cross-validated Optimal Bandwidths
CV=Crossvalidation log of mean squared leave-one-out error.