Decomposing Wage Distributions Using Recentered Influence Function Regressions” unpublished manuscript

This paper provides a detailed exposition of an extension of the Oaxaca-Blinder decomposition method that can be applied to various distributional measures. The two-stage procedure first divides distributional changes into a wage structure effect and a composition effect using a reweighting method. Second, the two components are further divided into the contribution of each explanatory variable using recentered influence function (RIF) regressions. We illustrate the practical aspects of the procedure by analyzing how the polarization of U.S. male wages between the late 1980s and the mid 2010s was affected by factors such as de-unionization, education, occupations, and industry changes.


Introduction
Distributional issues have attracted a lot of attention in labor economics over the last fifteen years. One important factor behind the resurgence of interest for distributional issues is the large increase in wage inequality in the United States and several other countries. There is also growing literature looking at wages differentials between subgroups that goes beyond simple mean comparisons. For example, several recent papers such as Albrecht, Björklund and Vroman (2003) look at whether the gender gap is larger in the upper tail than in the lower tail of the wage distribution because of a "glass ceiling". More generally, there is increasing interest for distributional impacts of various programs or interventions. In all these cases, the key question of economic interest is which factors account for changes (or differences) in distributions. For example, did wage inequality increase because education or other wage setting factors became more unequally distributed, or because the return to these factors changed over time?
In response to these important questions, a number of decomposition procedures have been suggested to untangle the sources of changes or differences in wage distributions.
Popular methods used in the wage inequality literature include the "plug-in" procedure of Juhn, Murphy, and Pierce (1993), the reweighting procedure of DiNardo, Fortin, and Lemieux (1996), and, more recently, the quantile-based decomposition method of Machado and Mata (2005). 1 Unfortunately, none of these methods can be used to decompose general distributional measures in the same way means can be decomposed using the conventional Oaxaca-Blinder method.
As is well known, the Oaxaca-Blinder procedure provides a way of 1) decomposing changes or differences in mean wages into a wage structure effect and a composition effect, and 2) further dividing these two components into the contribution of each covariate. The leading problem with the above mentioned decomposition methods is that they cannot be used to divide the composition effect into the role of each covariate. So while it is natural to ask to what extent changes in the distribution of education have contributed to the growth in wage inequality, this particular question has not been answered in the literature for lack of available decomposition methods. Similarly, we do not know the contribution of male-female differences in experience to the male-female difference in median wages for lack of available methods. In contrast, this question is straightforward to answer in the case of the mean using a Oaxaca-Blinder decomposition.
In this paper, we propose a two-stage procedure to perform Oaxaca-Blinder type 1 See also Gosling, Machin and Meghir (2000) for a related quantile decomposition method.
decompositions on any distributional measure, and not only the mean. The first stage of our approach consists of decomposing the distributional statistic of interest into a wage structure and a composition component using a reweighting approach, where the weights are either parametrically or non-parametrically estimated. As in the related program evaluation literature, we show that ignorability and common support are key assumptions required to identify separately the wage structure and composition effects.
Provided that these assumptions are satisfied, the underlying wage setting model can be as general as possible. The idea of the first stage is thus very similar to DiNardo, Fortin, and Lemieux (1996). A first contribution here is to clarify the assumptions required for identification, in the case of other distributional statistics besides the mean, by drawing a parallel with the program evaluation (treatment effect) literature. A related contribution is to provide analytical formulas for the standard errors of the reweighting estimates.
In the second stage, we further divide the wage structure and composition effects into the contribution of each covariate, just as in the usual Oaxaca-Blinder decompo-sition. This is done using a novel regression-based method proposed by Firpo, Fortin, and Lemieux (2006) to estimate the effect of changes in covariates on any distributional statistics such as the median, inter-quartile ranges, or the Gini coefficient. The idea of the method is to replace the dependent variable by the corresponding recentered influence function (RIF) for the distributional statistics of interest. The influence function is a widely used concept in robust statistics and is easy to compute. As a result, the (recentered) influence function regressions proposed by Firpo, Fortin, and Lemieux (2006) are as easy to estimate as ordinary least squares regressions.
We illustrate how our procedure works in practice by looking at changes in the distribution of male wages in the United States between the late 1980s and the mid 2000s.
This period is quite interesting from a distributional point of view as inequality increased in the top end of the wage distribution, but decreased in the low end of the distribution, a phenomena that Autor, Katz and Kearney (2006) have called the polarization of the U.S. labor market. We use our method to investigate the source of change in wages at different points of the wage distribution by decomposing the changes at various wage quantiles. The results indicate that no single factor appears to be able to explain the polarization of the labor market. De-unionization accounts for some of the decreasing wage inequality at the low end and increasing inequality at the top end. The continuing growth in returns to education, especially at a level above high school, is the most important source of growth in top-end inequality, but it cannot explain changes at the low-end. Changes in industrial and occupational structure of the workforce, and in the effect of industry and occupation, explain very little of the changes in inequality. This suggest that explanations such as the growth in high-tech sectors or the wage decline in "routine occupations" (Autor, Levy, and Murnane, 2003) have little impact on changes in the wage distribution, once education and other factors are controlled for. 2 The remainder of the paper is organized as follows. Section 2 discusses the decomposition problem and reviews the strengths and weaknesses of existing procedures. The identification of the proposed decomposition procedure is addressed in Section 3. Section 4 discusses estimation and inference, and illustrates how the decomposition methodology works in the case of the mean, the median, and the variance. Section 5 provides an empirical application of the methodology to the changes in the distribution of male wages in the United States between the late 1980s and the mid 2000s.

The Decomposition Problem and Shortcomings of Existing Methods
Before presenting our method in detail, it is useful to first review the case of the mean for which the standard Oaxaca-Blinder method is very well known. To simplify the exposition of the paper, we will work with the case where the outcome variable, Y , is the wage, though our approach can be used for any other outcome variable. The Oaxaca-Blinder method can be used to divide a difference in mean wages between two groups, or overall mean wage gap, into a composition effect linked to differences in covariates between the two groups, and a wage structure effect linked to differences in the return to these covariates between the two groups. The two groups are labelled as t = 0, 1. In the original papers by Oaxaca (1974) and Blinder (1973), the two groups used were either men and women, or blacks and whites. More generally, the two groups can be a control and a treatment group, or similar groups of individuals at two points in time, as in the wage inequality literature.
We first review how the Oaxaca-Blinder decomposition provides a straightforward way of dividing up the contribution of each covariate to both composition and wage structure effects. We then move to the case of more general distributional parameters to point out that existing methods do not provide a way of decomposing differences in these distributional parameters into the contribution of each covariate (to the composition and wage structure effect).

The Mean
We focus on differences in the wage distributions of two groups, 1 and 0. For a worker i, let Y 1i be the wage that would be paid in group 1, and Y 0i the wage that would be paid in group 0. Since a given individual i is only observed in one of the two groups, we either observe Y 1i or Y 0i , but never both. Therefore, for each i we can define the observed There is also a vector of covariates X ∈ X ⊂ R K that we can observe in both groups.
In the standard Oaxaca-Blinder decomposition, the conditional expectation of Y given X is assumed to be linear so that where E[ε ti |X i , T = t] = 0. Define the overall mean wage gap as ∆ µ O , and consider dividing the overall mean gap into a wage structure effect, ∆ µ S and a composition effect, ∆ µ X . Averaging over X, the mean wage gap ∆ µ O can be written as Under the linearity assumption, E [ε t |T = t] = 0 because E [ε t |X, T = t] = 0, and the expression reduces to The first term on the next to last line of the equation is the wage structure effect, ∆ µ S , while the second term is the composition effect, ∆ µ X . Note that the reference group used to compute the wage structure here is the group 1, though the decomposition could also be performed using group 0 instead as the reference group. The wage structure and composition effects can also be written in terms of sums over the explanatory variables where X k and β t,k represent the k th element of X and β t , respectively. This provides a simple way of dividing ∆ µ S and ∆ µ X into the contribution of a single covariate or a group of covariates as needed.
Because of the linearity assumption, the Oaxaca-Blinder decomposition is very easy to use in practice. It can be estimated by replacing the parameter vectors β t by their OLS estimates, and replacing the expected value of the covariates E [X | T = t] by the sample averages.
There are nonetheless some important limitations to the standard Oaxaca-Blinder decomposition. A well-known difficulty discussed by Oaxaca and Ransom (1999) and Gardeazabal and Ugidos (2004) is that the contribution of each covariate to the wage structure effect, E X k |T = 1 β 1,k − β 0,k , is highly sensitive to the choice of the base group. 3 A second limitation discussed by Barsky et al. (2002) is that the Oaxaca-Blinder decomposition provides consistent estimates of the wage structure and composition effect only under the assumption that the conditional expectation is linear. 4 One possible solution to the problem is to estimate the conditional expectation using non-parametric methods. Another solution proposed by Barsky et al. (2002) is to use a (non-parametric) 3 Consider, for instance, the contribution of increasing returns to education to changes in mean wages over time in the case where workers are either high school graduates or college graduates. In the case where high school is the base group, X i,k is a dummy variable indicating that the worker is a college graduate, and β 0,k and β 1,k are the effect of college on wages in years t = 0 and 1. If returns to college increase over time ( β 1,k − β 0,k > 0), then the contribution of education to the wage structure effect, X 1,k β 1,k − β 0,k , is positive, where X 1,k is the share of college graduates. If we use instead college as the base group, then X 1,k β 1,k − β 0,k is negative, where X 1,k represents the share of high school (X 1,k = 1 − X 1,k ) and β t,k represents the effect of high school (β t,k = −β t,k ). So whether changes in returns to schooling contribute positively or negatively to the change in mean wages critically depends on the choice of the base group.
4 As we will see later, the problem is that we are trying to estimate a counterfactual mean wage that would prevail if workers in group 1 were paid under the wage structure of group 0. Under the linearity assumption, this is equal to E [X | T = 1] β 0 , a term that appears in both the wage structure and composition effect. The problem is that when linearity does not hold, the counterfactual mean wage is not be equal to E [X | T = 1] β 0 . reweighting approach as in DiNardo, Fortin and Lemieux (1996) to perform the decomposition. The advantage of this solution is that it can be applied to more general distributional statistics. The disadvantage of both of these solutions, however, is that they do not provide direct ways, in general, of further dividing the contribution of each covariate to the wage structure and composition effects. 5

Other Distributional Statistics
The reweighting procedure proposed by DiNardo, Fortin and Lemieux (1996) provides consistent estimates of the wage structure and composition effects for any distributional statistic of interest under a set of assumptions discussed later in the paper. What this type of procedure does not provide, however, is a general way further dividing up the contribution of each single covariate to the wage structure and composition effect. One exception is the case of dummy covariates where a conditional reweighting procedure can then be used. 6 One problem is that this approach is not easily extended to covariates other than dummy variables. Furthermore, with many dummy covariates, one would have to compute a large number of conditional reweighting factors to account for the contribution of each covariate.
In the case of quantiles, Machado and Mata (2005) propose a decomposition procedure based on (conditional) quantile regression methods. 7 They consider the following regression model for the τ th quantile of Y conditional on the covariates X (τ goes from 0 to 1): In principle, running such quantile regressions for all possible quantiles should describe 5 We discuss the case of reweighting in more detail below. In the case where the conditional expectation E(Y i |X i , T = t) is estimated non-parametrically, a whole different procedure would have to be used to separate the wage structure into the contribution of each covariate. For instance, average derivative methods could be used to estimate an effect akin to the β coefficients used in standard decompositions. Unfortunately, these methods are difficult to use in practice, and would not be helpful in dividing up the composition effect into the contribution of each individual covariate.
6 Both DiNardo and Lemieux (1997) and DiNardo, Fortin, and Lemieux (1996) discuss the case where the dummy covariate is union status. DiNardo and Lemieux (1997) compute a "total" effect of unions by contrasting the actual distribution of wages to the distribution among non-union workers reweighted to have the same characteristics as the whole sample of workers. DiNardo, Fortin, and Lemieux (1996) compute the contribution of unions to the composition effect by contrasting actual changes in the wage distribution to changes that would have prevailed if the rate of unionization, conditional on other characteristics, had remained constant over time. The contribution of unions to the wage structure effect can then be obtained by taking the difference between the total contribution of unions and the contribution of unions to the composition effect only.
7 See also Albrecht, van Vuuren and Vroman (2004) for more details on the Machado Mata procedure.
the whole conditional distribution of wages. One can then use the β(τ ) estimated for one group to construct a counterfactual distribution for the other group, and then use this counterfactual distribution to compute the overall composition and wage structure effect. 8 Furthermore, if one plugs in the β(τ ) pertaining to a single covariate only, it is then possible to estimate the contribution of this covariate to the wage structure effect, as in the Oaxaca-Blinder decomposition.
There are however, a number of drawbacks to this procedure. First and foremost, it does not provide a way of dividing up the composition effect into the contribution of each single covariate. 9 Second, it is computationally difficult to implement as it involves estimating a large number of quantile regressions, and conducting large scale simulations. Third, as in the case of the mean, the decomposition is only consistent if the right functional form is used for quantiles. Since the right functional form has to be chosen for each and every quantiles, making sure that the specification is correct is a very difficult empirical exercise. 10 Other methods have been suggested by Juhn, Murphy, andPierce (1993), Fortin andLemieux (1998) and Donald, Green, and Paarsch (2000) to decompose changes in distributional statistics beyond the mean. These procedures have various strengths and weaknesses. Most importantly, they all share the same shortcoming as Machado and Mata (2005) in that they do not provide a way of dividing up the composition effect into the contribution of each individual covariate.
In summary, currently available methods can be used to compute the overall wage structure and composition effects for various distributional statistics. We build on this in 8 Both Machado and Mata (2005) and Albrecht et al. (2003) use conditional quantile regressions to construct counterfactual unconditional wage distribution. Machado and Mata (2005) draw n numbers at random to choose the quantiles, estimate the conditional quantile coefficients from the first group, then for each quantile draw a random sample from the covariates of the alternate group and generate the counterfactual wages. Albrecht et al. (2003) modify this procedure by choosing quantiles 1 through 99, and by taking 100 draws for each quantile. 9 Machado and Mata (2005) suggest using an unconditional reweighting procedure to compute the contribution of a covariate to the composition effect. For example, in the case of unions, they would suggest reweighting all union (or non-union) observations by a fixed factor. Unfortunately, doing so also changes the distribution of other covariates that are differently distributed in the union and nonunion sectors. As a result, the proposed procedure does not provide an estimate of the contribution of unions, holding the distribution of other covariates fixed. Doing so would require using a conditional reweighting factor, as in DiNardo, Fortin and Lemieux (1996). As discussed above, it is not clear how this can be implemented when covariates are not dummy variables. Furthermore, if one uses these types of reweighting procedures anyway, then everything that can be computed using the Machado and Mata procedure can also be computed using a (simpler) reweighting procedure.
10 Furthermore, if the correct functional form is not linear, it is then difficult to compute the contribution of each covariate to the wage structure effect, since there is no longer a single β(θ) coefficient associated to a given covariate, as in the linear case. the current paper by suggesting to estimate these two overall effects using a reweighting procedure. Available methods are much more limited, however, when it comes to further dividing the wage structure and, especially, the composition effect into the contribution each covariate. The main contribution of the paper is to suggest a simple regressionbased procedure to remedy this shortcoming building on recent work by Firpo, Fortin, and Lemieux (2006).

Wage Structure and Composition Effects
Following the treatment effect literature (Rosenbaum and Rubin, 1983, Heckman 1990, Heckman and Robb 1985, 1986, we focus on differences in the wage distributions between two groups, 1 and 0, or the "group effect". Suppose we could observe a random sample of N = N 1 + N 0 individuals, where N 1 and N 0 are the number of individuals in each group and we index individuals by i = 1, . . . , N. We define the probability that an individual i is in group 1 as p, whereas the conditional probability that an individual i is in group 1 given X = x, is p(x) = Pr[T = 1|X = x], sometimes simply called the "propensity-score".
Wage determination depends on some observed components X i and on some unobserved components ε i ∈ R m through the wage structure functions where g t (·, ·) are unknown real-valued mappings: g t : X × R m → R + ∪ {0}. As we are not imposing any distribution assumption or specific functional form, writing Y 1 and Y 0 in this way does not restrict the analysis in any sense. We will however assume that (T, X, ε), or equivalently (Y, T, X), have an unknown joint distribution but that is far from being restrictive. From observed data on (Y, T, X), we can non-parametrically identify the distributions Without further assumptions, however, we cannot identify the counterfactual distribution of Y 0 |T = 1 d ∼ F C . The counterfactual distribution F C is the one that would have prevailed under the wage structure of group 0, but with the distribution of observed and unobserved characteristics of group 1. For sake of completeness, we consider also the conditional distributions Y 1 |X, We typically analyze the difference in wage distributions between groups 1 and 0 by looking at some functionals of these distributions. Let ν be a functional of the conditional joint distribution of (Y 1 , Y 0 ) |T , that is ν : F ν → R, and F ν is a class of distribution functions such that F ∈ F ν if ν (F ) < +∞. The difference in the ν's between the two groups is called here the ν-overall wage gap, which is basically the difference in wages measured in terms of the distributional statistic ν: 11 We can use the fact that X is potentially unevenly distributed across groups to decompose equation (3) into two parts: where the first term ∆ ν S reflects the effect of differences in the "wage structure", which is summarized by the g t (·, ·) functions. Therefore, this first term corresponds to the effect on ν of a change from g 1 (·, ·) to g 0 (·, ·) keeping the distribution of (X, ε)|T = 1 constant.
With no other restrictions, the second term ∆ ν X will correspond to effects of changes in the distribution of (X, ε), keeping the "wage structure" g 0 (·, ·) constant, that is, the effect of changes in distribution from the one of (X, ε)|T = 1 to that of (X, ε)|T = 0. This is called the composition effect.
If we impose no assumption on the functional form of g t (·, ·) functions, then the first term of the sum, ∆ ν S , will reflect changes in the g t (·, ·) functions only if we are able to fix the distribution of observables and unobservables at the distribution prevailing for group 1, that is, the distribution of (X, ε)|T = 1. As long as F C is identifiable, we will be able to construct ν C .
The key point however is that the second term of the sum ∆ ν X will not necessarily reflect only changes in the distribution of X. By definition, it reflects changes in the joint distribution of (X, ε). The requirement for ∆ ν X to only reflect changes in the distribution of X is that ε be independent of T given X. We will see that this conditional independence assumption is also crucial for identification of F C and, therefore, of ν C .
Note that had we imposed assumptions on (i) the format of g 1 (·, ·) and g 0 (·, ·), and on (ii) the conditional expectation of ε given X and T , then we could have relaxed the conditional independence assumption. This is what we did in section 2.1 when considering 11 We will sometimes refer to the functional ν(F Z ) simply as ν Z . In the Oaxaca-Blinder decomposition discussed earlier, the parameter ν equals the mean (ν = µ) and ∆ ν O is the total difference in mean wages.
Our contribution here is to establish conditions for identification of ∆ ν S and ∆ ν X , where the latter only reflects changes in the distribution of X, (i) for a general ν, (ii) with no functional form assumptions on g 1 (·, ·) and g 0 (·, ·), and (iii) with no parametric assumption on the joint distribution of (Y, T, X).
Under the common assumptions of Ignorability and Overlapping Support, we can identify the parameters of interest and be sure that the interpretation given to the decomposition terms is the desired one. The ignorability assumption has become popular in empirical research following a series of papers by Rubin and coauthors and by Heckman and coauthors. 12 In the program evaluation literature, this assumption is sometimes called unconfoundedness and allows identification of the treatment effect on the treated sub-population.
The Ignorability assumption should be analyzed in a case-by-case situation, as it is more plausible in some cases than in others. In our case, it states that the distribution of the unobserved explanatory factors in the wage determination is the same across groups 1 and 0, once we condition on a vector of observed components. 13 Now consider the following assumption about the support of the covariates distribution: The Overlapping Support assumption requires that there be an overlap in observable characteristics across groups, in the sense that there no value of x in X such that it is only observed among individuals in group 1. 14 Under these two assumptions, we are able to identify the parameters of the counterfactual distribution of Y 0 |T = 1 d ∼ F C . In order to see how the identification result works, let us define first three relevant weighing functions: The first two reweighting functions transform features of the marginal distribution of Y into features of the conditional distribution of Y 1 given T = 1, and of Y 0 given T = 0.
The third reweighing function transforms features of the marginal distribution of Y into features of the counterfactual distribution of Y 0 given T = 1. We are now able to state our first identification result:

Theorem 1 [Inverse Probability Weighing]:
Under Assumptions 1 and 2: Identification of F C implies identification of ν (F C ) and therefore of ∆ ν S and ∆ ν X . Furthermore, because of the ignorability assumption, we know that differences between the conditional distributions of (X, ε) |T = 1 and of (X, ε) |T = 0 correspond only to differences in the conditional distributions F X|T =1 and F X|T =0 . Thus, ∆ ν X will only reflect changes in distribution of X. We state these results more precisely in the following theorem.

Theorem 2 [Identification of Wage Structure and Composition Effects]:
Under Assumptions 1 and 2: In Theorem 2, the identification of ∆ ν S and ∆ ν X follows from the realization that these quantities can be expressed as functionals of the distributions obtained by weighing the observations with the inverse probabilities of belonging to group 0 or 1 given T , as stated in Theorem 1. Note that the non-parametric identification of either the wage determination functions g 1 (·, ·) and g 0 (·, ·), or the distribution function of ε are not necessary for the effects ∆ ν S and ∆ ν X to be identified. Therefore, methods based on conditional mean restrictions (the Oaxaca-Blinder decomposition approach) and methods based on conditional quantile restrictions (the Machado-Mata approach) are based on too strong identification conditions that can be easily relaxed if we are simply interested in the terms ∆ ν S and ∆ ν X . Part (ii) of Theorem 2 also states that when there are no group differences in the wage determination functions, then we should find no wage structure effects, while part (iii) states that if there are no group differences in the distribution of the covariates, there will be no composition effects.

The RIF Regressions
One key contribution of the paper, as discussed in section 2, is to further divide the wage structure and composition effect into the contribution of each individual covariate. To do so, we use the method proposed by Firpo, Fortin and Lemieux (2006, FFL from hereon) to compute partial effects of changes in distribution of covariates on a given functional of the distribution of Y t |T . The method works by providing a linear approximation to a non-linear functional of the distribution. Thus, through collecting the leading term of a von Mises (1947) expansion, FFL approximate those non-linear functionals by expectations, which are defined as linear functionals or statistics of the distribution. Finally, that approximation method allows one to apply the law of iterated expectations to the distributional statistics of interest and thus to compute approximate partial effects of a single covariate on the functional being approximated.
The details of the method are summarized as follows. Consider again a general functional ν = ν (F ). Recall the definition of the influence function (Hampel, 1974), IF, introduced as a measure of robustness of ν to outlier data when F is replaced by the empirical distribution: )F + δ y , 0 ≤ ≤ 1 and where δ y is a distribution that only puts mass at the value y.
To simplify notation, write IF(y; ν, F ) = IF(y; ν). It can be shown that, by definition, We use a recentered version of the influence function RIF(y; ν) = ν(F ) + IF(y; ν), whose expectation yields the original ν : Letting ν t = ν(F t ) and ν C = ν(F C ), we can therefore write the distributional statistics ν 1 , ν 0 , and ν C as the expectations: Using the law of iterated expectations, the distributional statistics can also be expressed in terms of expectations of the conditional recentered influence Letting the so-called RIF-regressions be written as It follows that ∆ ν S and ∆ ν X can be rewritten as: In general, there is no particular reason to expect the conditional expectations m ν t (X) and m ν C (X) to be linear in X. As a matter of convenience and comparability with Oaxaca-Blinder decompositions, it is nonetheless useful to consider the case of the linear specification. To be more precise, consider the linear projections (indexed by L) m ν L (x) As is well known, even though linear projections are only an approximation for the true conditional expectation, the expected approximation error is zero, so that: We can thus rewrite ∆ ν S and ∆ ν X as: which generalizes the Oaxaca-Blinder decomposition to any distributional statistic through the projection of its rescaled influence function onto the covariates. Note here that under an additional assumption that m ν t,L (·) = m ν t (·) and m ν C,L (·) = m ν C (·), that is, if the conditional expectation is indeed linear in x, then γ ν 0 = γ ν C . In the case of the mean (ν = µ), it then follows that the equations above reproduce exactly the Oaxaca-Blinder decomposition.

Interpreting the Decomposition
We have just shown that, under a linearity assumption, the decomposition based on RIF-regressions looks very much like standard Oaxaca-Blinder decomposition. We now go beyond this simple analogy to define more explicitly what we mean by the contribution of each single covariate to the wage structure and composition effect.

Composition Effects
FFL show that RIF-regression estimates can either be used to estimate the effect of a "small change" of the distribution of X on ν, or to provide a first-order approximation of a larger change of the distribution of X on ν. The latter effect, that FFL call a "policy effect", is what concerns us here. In fact, the composition effect ∆ ν X exactly corresponds to FFL's policy effect, where the "policy" consists of changing the distribution of X from its value at T = 0 to its value at T = 1 (holding the wage structure constant).
For the sake of simplicity, we continue with the linear specification introduced in Section 3.2. As it turns out, FFL show that, in the case of quantiles, using a linear specification for RIF-regressions generally yields very similar estimates to more flexible methods allowing for non-linearities. 16 We nonetheless discuss below the consequences of the linearity assumption for the interpretation of the results.
An explicit link with the results of FFL concerning policy effects is obtained by rewriting equation (8) as where . The first term in equation (9) is now similar to the standard Oaxaca-Blinder type composition effect, and can be rewritten in terms of the contribution of each covariate as Each component of this equation can be interpreted as the "policy effect" of changing the distribution of one covariate from its T = 0 to T = 1 level, holding the distribution of the other covariates unchanged.
The second term in equation (9), R ν , is the approximation error linked to the fact that FFL's regression-based procedure only provides a first-order approximation to the composition effect ∆ ν X . In practice, it can be estimated as the difference between the reweighting estimate of the composition effect, ν C − ν 0 , and the estimate of E [X|T = 1] −E [X|T = 0] · γ ν 0 obtained using the RIF-regression approach. When the latter approach provides an accurate (first-order) approximation of the composition effect, the error should be small. Looking at the magnitude of the error thus provides a specification test of FFL's regression-based procedure.
Note that using a linear specification for the RIF-regression instead of a general func- simply changes the interpretation of the specification error R ν by adding an error component linked to the fact that a potentially incorrect specification may be used for the RIF-regression. We nonetheless suggest using the linear specification in practice for three reasons. First, we get an approximation error anyway since FFL's procedure only gives a first-order approximation to the impact of "large" changes in the distribution of X. Second, the linear specification does not affect the overall estimates of the wage structure and composition effects that are obtained using the reweighting procedure. Third, using a linear specification has the advantage of providing a much simpler interpretation of the decomposition, as in the Oaxaca-Blinder decomposition. Our suggestion is thus to use the linear specification but also look at the size of the specification error to make sure that the FFL approach provides an accurate enough approximation for the problem at hand.

Wage Structure Effect
The wage structure effect in equation (7), , already looks very much like the usual wage structure effect in a standard Oaxaca-Blinder decomposition. One difference relative to the usual Oaxaca-Blinder decomposition is that the coefficient γ ν C (the regression coefficient when the group 0 data is reweighted to have the same distribution of X as group 1) is used instead of γ ν 0 (the unadjusted regression coefficient for group 0). The reason for using γ ν C instead of γ ν 0 is that the difference γ ν 1 − γ ν C solely reflects differences between the wage structures g 1 (·) and g 0 (·), while the difference γ ν 1 − γ ν 0 may be contaminated by differences in the distribution of X between the two groups.
This will happen, for example, in the case of the mean when the linear regression model is only an approximation of an underlying non-linear conditional expectation, as in Barsky et al. (2002). 17 So while our reweighting method for dividing the overall wage gap into a wage structure and a composition effect is similar to the approach suggested by Barsky et al. (2002), we also suggest an approach, based on estimating the regression coefficient γ ν C in the reweighted sample, to divide up the contribution of each individual covariate as in a standard Oaxaca-Blinder decomposition.
In other words, using γ ν C instead of γ ν 0 allows us to deal with one of the two limitations of Oaxaca-Blinder decompositions discussed in Section 2. The other limitation of standard decompositions mentioned in that section is that the contribution of each covariate to the wage structure effect is sensitive to the choice of a base group. This problem also affects our proposed decomposition method. There is, unfortunately, no simple solution to this problem. To see this, rewrite the wage structure effect where ν B1 is the distributional statistic in an arbitrary "base group" under the wage 17 We also show in the examples below that this problem is even more likely to arise in the case of distributional statistics other than the mean, such as quantiles.
structure g 1 (·, ·), while ν BC is the distributional statistic for the same base group under the wage structure g 0 (·, ·). The term ν 1 − ν B1 represents the "policy effect" of changing the distribution of X from its value in the base group to its T = 1 value under the wage structure g 1 (·, ·), while ν C − ν BC represents the corresponding policy effect under the wage structure g 0 (·, ·). Since there is no dispersion in X in a base group of workers with similar characteristics, switching to the actual distribution of X will typically result in more wage dispersion. The overall wage structure effect is, thus, equal to the difference in the dispersion enhancing effect under g 1 (·, ·) and g 0 (·, ·), respectively, plus a "residual" difference in the distributional statistic in the base group, ν B1 − ν BC . Unless this residual change is invariant to the choice of the base group, the contribution of each covariate to the wage structure will be sensitive to the choice of base group.
This last point is easier to see in the case of the linear specification where the wage structure effect is given by where γ 1,1 − γ C,1 is the difference in the intercepts of the model (the first element of the vector of covariates X is the constant). This difference corresponds to the residual difference in the special case where the base group consist of individuals with X k = 0, for k = 2, ..., K. In the more general case of a base group defined by X k = x k B , we instead have: Both the residual difference (the last term on the right hand side of the equation) and the wage structure effect associated to a given covariate k, E X k − x k B |T = 1 · γ ν 1,k − γ ν C,k , thus depend on the choice of the base group. As a result, great care must be taken in interpreting this particular aspect of the decomposition. For example, we show the sensitivity to the choice of the base group in the empirical example of Section 5.
If the RIF-regression approach provides an accurate approximation of the underlying policy effects, then we should have that This provides another specification test of FFL's approach. If it provides a good approximation, then the predicted change in the base group (right hand side of the equation) should be close to the actual change in the distributional statistic observed in the base group, ν B1 − ν B0 , which can be estimated separately. 18

Estimation and Inference
In this section, we discuss how to estimate the different elements of the decomposition introduced in the previous section: ν 1 , ν 0 , ν C , γ 1 , γ 0 and γ C . For ν 1 , ν 0 , γ 1 and γ 0 , the estimation is very standard because the distributions F 1 , and F 0 , are directly identified from data on (Y, T, X). The distributional statistic ν 1 , ν 0 can be estimated as their sample analogs in the data, while γ 1 and γ 0 can be estimated using standard least square methods. In contrast, the estimation of ν C and γ C requires first estimating the weighting function ω C (T, X). We present two common methods-parametric and non-parametric-to estimate ω C (T, X). We discuss separately the estimation of the first and second stages of the decomposition. The first stage relies on a reweighting procedure, while the second stage is based on the estimation of RIF-regressions. We only present the general lines of the estimation procedure in this section. Proofs and details about the parametric and non-parametric procedure to estimate ω C (T, X) are presented in the appendix. The asymptotic behavior of the estimators is also discussed in the appendix. Finally, we show how the estimation procedure can be applied to the specific cases of the mean, median, and variance.

First Stage Estimation
The first step of the estimation procedure consists of estimating the weighting function ω C (T, X). Then the distributional statistics ν 1 , ν 0 , ν C are computed directly from the appropriately reweighted samples.

Estimating the Weights
We are interested in estimating weights ω that are generally functions of the distribution of (T, X). The three weighting functions under consideration are ω 1 (T ), ω 0 (T ), and ω C (T, X). The first two weights are trivially estimated by: where p (·) is an estimator of the true probability of being in group 1 given X. In the appendix, we describe in details the two approaches that we consider, a parametric and a non-parametric one. In addition, in order to have weights summing up to one, we use the following normalization procedures: , .

Estimating the Distributional Statistics
We are interested in the estimation and inference of ν 1 , ν 0 , ν C . It can be shown that under certain regularity conditions, estimators of these objects will be distributed asymptotically normal. We show how to estimate those quantities, and their asymptotic distributions are derived in the appendix.
The estimation follows a plug-in approach. Replacing the CDF by the empirical distribution function produces the estimators of interest: Note that, in practice, it is not usually necessary to compute these empirical distribution functions to get estimates of a distributional statistic, ν. Standard software programs such as Stata can be used to compute distributional statistics directly from the observations on Y weighted using the appropriate weighting factor.
The estimated distributional statistics can then be used to estimate the wage structure and composition effects as ∆ ν S = ν 1 − ν C and ∆ ν X = ν C − ν 0 .

Second Stage Estimation
Now consider estimation of the regression coefficients γ ν 1 , γ ν 0 , and γ ν C : where for t = 0, 1 RIF(y; ν t ) = ν t + IF(y; ν t ) and RIF(y; ν C ) = ν C + IF(y; ν C ) and IF(·; ν) is a proper estimator of the influence function. We discuss how to estimate the influence function for a number of specific cases in Section 4.3.
We can thus decompose the effect of changes from T = 0 to T = 1 on the distributional statistic ν as: It is also useful to rewrite the estimate of the composition effect as is the approximation error discussed in Section 3.1. This generalizes the Oaxaca-Blinder decomposition to any distributional statistic, including the variance or the Gini coefficient.

Examples
We now turn to the specific cases of the mean, the median, and the variance to illustrate how the different elements of the decomposition can be computed in these specific cases.

The Mean
The standard Oaxaca-Blinder decomposition presented in Section 2 is only valid under the assumption that the underlying "structural" model is linear and under the zero conditional mean assumption E(ε ti |X i , T ) = 0. In contrast, our twostage decomposition neither requires linearity nor the zero conditional mean assumption (ignorability is sufficient). In a first stage, we compute the means by reweighting to estimate the wage gaps Note, in particular, that we can compute the counterfactual µ C without any assumptions on the functional form of g t (·).
In the second stage, we further decompose these expressions into components attributable to each covariate by estimating OLS regressions of the RIF on X for the T = 0, 1 samples, and the T = 0 sample reweighted to have the same distribution of X as in T = 1.
As is well known, the influence function of the mean at point y is its deviation from the mean and, therefore, the rescaled influence function of the mean is simply the observation RIF(y; µ) = IF(y; µ) + µ = y.
As a result, the RIF-regression coefficients in the case of the mean are identical to standard regression coefficients of Y on X used in the Oaxaca-Blinder decomposition (β t above), and we have When the linearity and zero conditional mean assumption of the Oaxaca-Blinder decomposition are satisfied, it follows that γ µ C = γ µ 0 and R µ = 0. Our decomposition is then identical to the Oaxaca-Blinder decomposition. But when these conditions are not satisfied the two decompositions are different.

The Median
Quantiles are another set of distributional measures that have been used for the decomposition of wage distributions. In decompositions of the gender wage gap, they are used to address issues such as glass ceilings and sticky floors. In the example below, they will be used, for example, to differentiate the impact of unions in the middle of the distribution from its impact in the tails (Chamberlain, 1994).
A leading example of an estimator in this class is the median. The influence function of the median, ψ me , is and the rescaled influence function is RIF(y; me) = me + (1/2 − 1I{y ≤ me})/f(me).
The decomposition of the median proceeds along the same steps as in the case of the mean. In the first stage, the estimates of me t , t = 0, 1 and me C are obtained by reweight- Note that these estimates can simply be computed using standard software packages with the appropriate weighting factor.
The estimators for the gaps are computed as: In the second stage, we estimate the linear RIF-regressions. First, the rescaled influence function is computed for each observation by plugging the sample estimate of the quantile, q τ , and estimating the density at the sample quantile, f ( q τ ). For example, for the median of Y 1 |T = 1, we would use RIF(y; me 1 ) = me 1 + f 1 ( me 1 ) is a consistent estimator for the density of Y 1 |T = 1, f 1 (·). For example, kernel methods can be used to estimate the density (see FFL for more detail).
The RIF-regressions are then estimated by replacing the usual dependent variable, Y , by the estimated value of RIF(y; me 1 ). Standard software packages can be used to do so. The resulting regression coefficients are Similarly to the case of the mean, we get: where R me = E [X|T = 1] · ( γ me C − γ me 0 ).

The Variance
There are other applications where it is useful to decompose the impact of covariates on the variance of the distributions of wages. Examples include the compression effect of unions and of public sector wage setting.
As before, the estimators of the gaps can be computed as: using the reweighting scheme σ 2 The influence function of the variance is well-known to be and the rescaled influence function is the first term of this expression RIF(y; The decomposition in terms of individual covariates, such as union coverage, follows by replacing RIF(·; me) by RIF(·; σ 2 ) in equations (20)

The Gini
Finally, another popular measure of wage inequality is the Gini. Recall that Gini coefficient is defined as where . The generalized Lorenz curve tracks the cumulative total of y divided by total population size against the cumulative distribution function and the generalized Lorenz ordinate can be interpreted as the proportion of earnings going to the 100p% lowest earners.
As shown in Monti (1991), the influence function of the Gini coefficient is where with R(F Y ) and GL(p(y); F Y ) as defined in equation (26). Thus the recentered influence function of the Gini is simply In estimation, the GL coordinates are computed using a series of discrete data points y 1 , . . . y N , where observations have been ordered so that y 1 ≤ y 2 ≤ . . . ≤ y N , so that where the numerators are the sum up the i ordered values of Y . The R(F t ), t = 0, 1 and R(F C ) are obtained by numerical integration of GL t (p(y i )) over p t (y i ), and of GL C (p(y i )) over p C (y i ). 19 The estimates of ν GC (F t ), t = 0, 1 and ν GC (F C ) are obtained by subtituting R(F t ) and R(F C ), as well as µ t and µ C , into equation (26). We can then compute the gaps for the changes in the Gini coefficient as in equation (24).

Empirical Application: Changes in Male Wage Inequality between 1988 and 2005
It is well known that wage inequality increased sharply in the United States over the last 30 years. Using various distributional methods, Juhn, Murphy and Pierce (1993) 19 In pratice, we simply use STATA integ command. and DiNardo, Fortin and Lemieux (1996) show that inequality expanded all through the wage distribution during the 1980s. In particular, both the "90-50 gap" (the difference between the 90th and the 50th quantile of log wages) and the "50-10 gap" increased during this period.
Since the late 1980s, however, changes in inequality have increasingly been concentrated in the top end of the wage distribution. In fact, Autor, Katz and Kearney (2006) show that while the 90-50 gap kept expanding over the last 15 years, the 50-10 gap declined during the same period. They refer to these recent changes as an increased polarization of the labor market. An obvious question is why wage dispersion has changed so differently at different points of the distribution. Autor, Katz and Kearney (2006) suggest that technological change is a possible answer, provided that computerization resulted in a decline in the demand for skilled but "routine" tasks that used to be performed by workers around the middle of the wage distribution. 20 Lemieux (2007)  He suggests that if this explanation is an important one, then changes in relative wages by occupation, i.e. the contribution of occupations to the wage structure effect, should play an important role in changes in the wage distribution. Furthermore, since it is well know that education wage differentials kept expanding during after the late 1980s (e.g. Deschênes 2004), the contribution of education to the wage structure effect is another leading explanation for inequality changes over this period.
Existing studies also show that composition effects played an important role over the 1988-2005 period. Lemieux (2006b) shows that all the growth in residual inequality over this period is due to composition effects linked to the fact that the workforce became older and more educated, two factors associated with more wage dispersion. Furthermore, Lemieux (2007) argues that de-unionization, another composition effect the way it is defined in this paper, still contributed to the changes in the wage distribution over this period.
These various explanations can all be categorized in terms of the respective contributions of various sets of factors (occupations, unions, education, experience, etc.) to either wage structure or composition effects. This makes the decomposition method proposed in this paper ideally suited for estimating the contribution of each of these possible ex-planations to changes in the wage distribution. Applying our method to this issue fills an important gap in the literature, since no existing study has systematically attempted to estimate the contribution of each of the aforementioned factors to recent changes in the U.S. wage distribution. 21 Our empirical analysis is based on data for men from the 1988-90 and 2003-05 Outgoing Rotation Group (ORG) Supplements of the Current Population Survey. The data files were processed as in Lemieux (2006b) who provides detailed information on the relevant data issues. The wage measure used is an hourly wage measure computed by dividing earnings by hours of work for workers not paid by the hour. For workers paid by the hour, we use a direct measure of the hourly wage rate. In light of the above discussion, the key set of covariates on which we focus are education (six education groups), potential experience (nine groups), union coverage, and occupation (17 categories). We also include controls for industry (14 categories), marital status, and race in all the estimated models. The sample means for all these variables are provided in Table A1. 22 To capture the rich pattern of change in the wage distribution between [1988][1989][1990] and 2003-05, we decompose the changes in 19 different wage quantiles (from the 5 th to the 95 th quantile) equally spread over the whole wage distribution. This enables us to see whether different factors have different impacts at different points of the wage distribution. Using this flexible approach, as opposed to summary measures of inequality like the Gini coefficient or the variance of log wages, is important since wage dispersion changes very differently at different points of the distribution during this period.

RIF-Regressions
Before showing the decomposition results, we first present some estimates from the RIFregressions for the different wage quantiles, and for the variance of log wages and the Gini coefficient. From equation (18), we compute IF(y i ; q τ ) for each observation using the sample estimate of q τ , and the kernel density estimate of f (q τ ) using the Epanechnikov kernel and a bandwidth of 0.06. In addition to the reweighting factors discussed in Sections 3 and 4, we also use CPS sample weights throughout the empirical analysis. In practice, this means that we multiply the relevant reweighting factor with CPS sample 21 Autor, Katz and Kearney (2005) use the Machado and Mata (2005) method to decompose changes at each quantile into a "price" (wage structure) and "quantity" (composition) effect. They do not further consider, however, the contribution of each individual covariate to the wage structure effect, except for separating the contribution of (all) covariates from the residual change in inequality. See also Lemieux (2002) for a similar decomposition based on a reweighting procedure.
22 Table A2 gives the details of the occupation and industry categories used. weight.
The RIF-regression coefficients for the 10 th , 50 th , and 90 th quantiles in 1988-90 and 2003-05, along with their (robust) standard errors are reported in Table 1. The RIFregression coefficients for the variance and the Gini are reported in Table 2. Detailed estimates for each of the 19 quantiles from the 5 th to the 95 th are also reported in Figure   1. Both Table 1 and the first panel of Figure 1 show that the effect of the union status across the different quantiles is highly non-monotonic. In both 1988In both -90 and 2003In both -2005, the effect first increases up to around the median, and then declines. The union effect even turns negative for the 90th and 95th quantiles. On the whole, unions tend to reduce wage inequality, since the wage effect tends to be larger for lower than higher quantiles of the wage distribution. As shown by the RIF-regressions for the more global measures of inequality-the variance of log wages and the Gini coefficient-displayed in Table 2, the effect of unions on these measures is negative, although the magnitude of that effect has decreased over time. This is consistent with the well-known result (e.g. Freeman, 1980) that unions tend to reduce the variance of log wages for men.
More importantly, the results also indicate that unions increase inequality in the lower end of the distribution, but decrease inequality even more in the higher end of the distribution. For example, the estimates in Table 1 for 1988-90 imply that a 10 percent increase in the unionization rate would increase the 50-10 gap by 0.024, but decrease the 90-50 gap by 0.043. 23 As we will see later in the decomposition results, this means that the continuing decline in the rate of unionization can account for some of the "polarization" of the labor market (decrease in inequality at the low-end, but increase in inequality at the top end).
The results for unions also illustrate an important feature of RIF regressions for quantiles, namely that they capture the effect of covariates on both between-and within-group component of wage dispersion. As made clear in the numerical exercise below, the withineffect of unions on log wages across quantiles is negatively sloped (reduces inequality) while the between effect is positively sloped (increases inequality). The different relative strength of between and within effects at different quantiles explain the inverse U-shaped effect of unions. This is in sharp contrast with the effect of unions found in conditional quantile regressions which capture only within-group effects and is thus only negatively sloped.
23 These numbers are obtained by multiplying the change in the unionization rate (0.1) by the difference between the effects at the 50th and 10th quantiles (0.394-0.158=0.236), and at the 90th and 50th quantiles (-0.053-0.394=-0.429). Table 1 for other covariates also capture betweenand within-group effects, just as in the case of unions. Consider, for instance, the case of college education. Table 1 and Figure 1 show that the effect of college increases monotonically as a function of percentiles. In other words, increasing the fraction of the workforce with a college degree has a larger impact on higher than lower quantiles. The reason why the effect is monotonic is that education increases both the level and the dispersion of wages (see, e.g. Lemieux, 2006a). As a result, both the within-and the between-group effects go in the same direction of increasing inequality. Similarly, the effect of experience also tends to be monotonic as experience has a positive impact on both the level and the dispersion of wages.

The RIF-regression estimates in
Another clear pattern that emerges in Figure 1 is that, for most inequality enhancing covariates, i.e. those with a positively sloped curve, the inequality enhancing effect increases over time. In particular, the slopes for high levels of education (college graduates and post-graduates) and high wage occupations (financial sales, doctors and lawyers) become clearly steeper over time. This suggests that these covariates make a positive contribution to the wage structure effect.
There are some changes in the contribution of occupations and industries that are consistent with technological change, however these changes are dwarfed the ones associated with other explanations. For example, there are some increases in the returns to engineering and computer occupations, and in high-tech service industries, but these are extremely small in comparison to the increases in the insurance, real estate and financial sales occupations. There are increases in the penalties to routine production occupations in the upper-middle of wage distribution and at the lower end of the distribution. There are also decreases in the penalties to some low skilled non-routine occupations and associated industries, such as service occupations and truck driving and the retail industry, but these changes in relatively small. In summary, the changes in the rewards and penalties associated with occupations and industries are likely too modest to account for a significant share of the changes in the wage structure between 1988 and 2005.
To help interpret the results, we now present a simulation exercise to illustrate how the between and within-group effects work in the case of union before returning to the main decomposition results.

Numerical Example of Between-and Within-Group Effects of Unions
It is well known in the literature that unions have an inequality enhancing effect because they increase the conditional mean of wages, which creates a wedge between otherwise comparable union and non-union workers. This between-group effect is offset, however, by the within-group effect linked to the fact that unions reduce the conditional dispersion of wages. In the case of the variance, it is easy to write down an analytical expression for the between-and within-group effects (see, for example, Card, Lemieux, and Riddell, 2004) and see under which conditions one effect dominates the other. It is much harder to know, however, whether the between-or the within-group effect tends to dominate at different points of the wage distribution.
We illustrate the effect of unions at each percentile of the wage distribution using a simple simulation exercise presented in Figure A1. We assume that union and non-union (log) wages are normally distributed with standard deviations of 0.2 and 0.4, respectively. The union wage gap is set to 0.3 (mean log wages of 2.3 and 2.0 in the union and nonunion sectors). The overall density of wages is obtained by adding the densities from the union and non-union sectors, assuming a 25 percent unionization rate. Since no other covariates are included in the example, the "effect" of unions at each percentile of the overall distribution is simply the difference between the average value of the recentered influence function for union and non-union workers. 24 Panel A of Figure A1 shows the between-group effect at each percentile. The effect is obtained by setting the standard deviation in the union sector at 0.4 (same as non-union) to isolate the impact linked to the fact that the mean log wage is 0.3 larger in the union than non-union sector. Since the curve in Panel A is positively sloped, the between-group effect increases inequality. In contrast, the within-group effect of unions illustrated in Panel B reduces inequality since the curve is negatively sloped instead. This effect is obtained by setting mean log wages in both the union and non-union sector to 2.0, to isolate the impact of the wage compression effect of unions.
The total effect of unions that includes both the between-and within-group components is shown in Panel C of the figure. The effect looks qualitatively similar to the actual union effect estimates reported in Figure 1. The effect of unions first becomes larger in the lower half of the distribution, but turns around and becomes negative by the time we reach the 90th percentile. Roughly speaking, we see that the inequality enhancing between-group effect dominates in the lower end of the distribution, while the 24 The effect is equal to [ U is the unionization rate (0.25 here), and f s (·) and F s (·) are the normal PDF and CDF in the union (s = U ) and non-union (s = N ) sectors. This result can also be directly obtained by noting that since the overall CDF is F (q τ ) = U · F U (q τ ) + (1 − U ) · F N (q τ ), the total differential (holding inequality reducing within-group effect dominates in the upper end of the distribution.
Note that unlike the case of the variance where the between-and within-group effects add-up exactly, these two effects do not directly add-up in the case of quantiles because of the underlying non-linear structure of the model.
The last panel of Figure A1 provides a different type of intuition for the inverse Ushaped nature of the effect of unions. The panel shows the CDF of wages for union, non-union, and all (25 percent union, 75 percent non-union) workers. The CDF for all workers, F (·), is simply the weighted average of the CDF for union, F U (·), and nonunion, F N (·), workers: Since there are very few union workers below a log wage of about 2 in the example, the overall CDF in that part of the distribution is essentially just the non-union CDF, F N (q τ ), times the constant 1 − U . The higher is the unionization rate, the lower is 1 − U , and the flatter is (1 − U ) · F N (q τ ). Panel D indeed shows that the CDF for all workers below about 2.0 (the dotted line) is flatter than the non-union CDF. The horizontal distance between the CDF with (dotted line) and without unions (the non-union CDF) thus increases as a function of percentiles in this part of the distribution. Since this horizontal distance corresponds to a wage impact of unions for a given percentile, this means that this wage effect first increases as a function of percentiles, just like in Panel C. But once we get above 2.0, the horizontal distance between the CDF curves for nonunion and all workers starts decreasing as we hit the mass of union workers who have more evenly distributed wages (i.e. a steeper CDF). This accounts for the reversal of the union effect shown in Panel C.

Decomposition Results
The results of the decomposition are presented in Figure 2. Table 3 also summarizes the results for the standard measure of top-end (90-50 gap) and low-end (50-10) wage inequality, as well as for the variance of log wages and the Gini coefficient. The base group used in the decomposition consists of non-union, white, and married men with some college education, and 20 to 24 years of potential experience. 25 The covariates used in the RIF-regression models are those discussed above and listed in Table A1. A richer 25 We also present an alternative set of results in Figure A2 when high school education is used instead of some college as the base group. specification with additional interaction terms is used to estimate the logit models used compute the reweighting factor ω C (T i , X i ). 26 Figure 2a shows the overall change in (real log) wages at each percentile τ , ∆ τ O , and decomposes this overall change into a composition (∆ τ X ) and wage structure (∆ τ S ) effect using the reweighting procedure. Consistent with Autor, Katz and Kearney (2006), the overall change is U-shaped as wage dispersion increases in the top-end of the distribution, but declines in the lower end. This stands in sharp contrast with the situation that prevailed in the 1980s when the corresponding curve was positively sloped as wage dispersion increased at all points of the distribution (Juhn, Murphy, and Pierce, 1993).
Most summary measures of inequality such as the variance or the 90-10 gap nonetheless increase over the 1988-2005 period as wage gains in the top-end of the distribution exceed those at the low-end. In other words, though the curve for overall wage changes is U-shaped, its slope is positive, on average, suggesting that inequality generally goes up. Figure 2a also shows that, consistent with Lemieux (2006b), composition effects have contributed to a substantial increase in inequality. In fact, once composition effects are accounted for, the remaining wage structure effects follow a "purer" U-shape than overall changes in wages. The lowest wage changes are now right in the middle of the distribution (30 th to 70 th percentile), while wage gains at the top and low end are quantitatively similar. Accordingly, Table 2 shows that all of the 0.059 change in the 90-10 gap is explained by the composition effects. By the same token, however, composition effects cannot account at all for the U-shaped nature of wage changes.
Figure 3 moves to the next step of the decomposition using RIF-regressions to attribute the contribution of each set of covariates to the composition effect. Figure 4 does the same for the wage structure effect. Figure 3a compares the "total" composition effect obtained by reweighting that was reported in Figure 2a, ∆ τ X , to the composition effect explained using the RIF-regressions, The difference between the two curves is the specification (approximation) error R τ . The error term is generally quite small and does not exhibit much of a systematic pattern. This means that the RIF-regression model does a very good job at tracking down the composition effect estimated consistently using the reweighting procedure. Figure 3b then divides the composition effect (explained by the RIF-regressions) into the contribution of five main sets of factors. 27 To simplify the discussion, let's focus on the 26 The logit specification also includes a full set of interaction between experience and education, union status and education, union status and experience, and education and occupations. 27 The effect of each set of factors is obtained by summing up the contribution of the relevant covariates. For example, the effect for "education" is the sum of the effect of each of the five education categories impact of each factor in the lower and upper parts of the distribution that is summarized in terms of the 50-10 and 90-50 gaps in Table 3. With the notable exception of unions, all factors have a larger impact on the 50-10 than on the 90-50 gap. In fact, the total contribution of industries, occupations, education and "other" factors (race and marital status) to the 50-10 gap is 0.049, which largely exceeds the total composition effect 0.025, while the contribution to the 90-50 gap is 0.000, well below the total composition effect (0.048). Composition effect linked to factors other than unions thus go the "wrong way" in the sense that they account for rising inequality at the bottom end while inequality is actually rising at the top end, a point noted earlier by Autor, Katz, and Kearney (2005).
In contrast, composition effects linked to unions (the impact of de-unionization) reduce inequality at the low end (effect of -0.017 on the 50-10) but increases inequality at the top-end (effect of 0.031 on the 90-50). Note that, just as in an Oaxaca-Blinder decomposition, these effects on the 50-10 and the 90-50 gap can be obtained directly by multiplying the 7.1 percent decline in the unionization rate (Table A1) Table 1. The effect of de-unionization accounts for about 25 percent of the total change in either the 50-10 or the 90-50 gap, which is remarkably similar to the relative contribution of de-unionization to the growth in inequality in the 1980s (see Freeman, 1993, Card, 1992, and DiNardo, Fortin and Lemieux, 1996. E X k |T = 1 · γ ν 1,k − γ ν C,k , and the residual change γ ν 1,1 − γ ν C,1 (the change in the intercepts). 28 The contribution of each set of factors is then shown in Figure 4b. As in the case of the composition effects, it is easier to discuss the results by focusing on the 90-50 and 50-10 gaps shown in Table 3. The results first show that -0.067 of the -0.091 change (decline) in the 50-10 gap due to wage structure effects remains unexplained. Covariates do a better job explaining changes in the 90-50 gap where only 0.030 of the 0.077 change remains unexplained. The main reason why the model explains better the 90-50 gap is that wage structure effects linked to education have contributed to a 0.67 increase the 90-50 gap, which represents most of the total 0.077 change linked to wage structure effect. In contrast, education has a very modest effect on the 50-10 gap.
shown in Table 1. Showing the effect of each individual dummy separately would be cumbersome and harder to interpret. 28 We show in Figure A3 that the residual change captured by the difference in intercepts γ ν 1,1 − γ ν C,1 is very similar to the actual wage changes in the base group. As discussed in Section 4.3, this further specification test suggests, once again, that the RIF-regression method provides a good approximation of the effect of large changes in the distribution of X on quantiles.
These findings confirm Lemieux (2006a)'s conjecture that the large increase in the return to post-secondary education has contributed to a convexification of the wage distribution.
Compared to education, the impact of most other factors is relatively modest. For example, wage structure effects linked to occupations account for some of the decline in inequality at the low-end, but for little of the growth at the top end. This suggests that technological changes that reduce wages in routine occupations but increase wages in non-routine occupation have had a modest impact on the wage structure between 1988-2005, once education is controlled for.
Finally, the total effect of each covariate (wage structure plus composition effect) is reported in Figure 2b and in the bottom panel of Table 3. Unions and education are clearly the two dominant explanations for recent changes in the wage distribution. In both cases, the total effect of these factors on the 90-50 gap is about 0.04-0.05 larger than the effect on the 50-10 gap. This goes a substantial way towards explaining the polarization of the labor market, i.e. why the 90-50 gap increased by 0.19 more than the 50-10 gap.

Conclusion
We propose a two-stage method to decompose changes in the distribution of wages (or other outcome variables). In stage 1, distributional changes are divided into a wage structure effect and a composition effect using a reweighting method. In stage 2, these two components are further divided into the contribution of each individual covariate using a novel influence function regression technique introduced by Firpo, Fortin, and Lemieux (2006). This two-stage procedure generalizes the popular Oaxaca-Blinder decomposition method by extending the decomposition to any distributional measure (besides the mean), and allowing for a much more flexible wage setting model. Other procedures have been suggested for performing part of this decomposition for distributional parameters besides the means. One important advantage of our procedure is that it is easy to use in practice, as it simply involves estimating a logit model (first stage) and running leastsquare regressions (second stage). Another advantage is that it can be used to divide the contribution of each covariate to the composition effect, something that other existing methods cannot do.
We illustrate the workings of our method by looking at changes in male wage inequality in the United States between 1988 and 2005. This is an interesting case to study as the wage distribution changed very differently at different points of the distribution, a phenomena that cannot be captured by summary measures of inequality such as the variance of log wages. Our method is particularly well suited for looking in detail at the source of wage changes at each percentile of the wage distribution. Our findings indicate that unions and education are the two most important factors accounting for the observed changes in the wage distribution over this period.

Nonparametric propensity score estimation
Suppose that p (X) is completely unknown to the researcher. In that case, following Hirano, Imbens and Ridder (2003), we approximate the log odds ratio by a polynomial series. In practice, this is done by finding a vector π that is the solution of the following problem: where H J (x) = [H J, j (x)] (j = 1, ..., J ), a vector of length J of polynomial functions of x ∈ X satisfying the following properties: (i) H J : X → R J ; and (ii) H J, 1 (x) = 1. More details on this estimation procedure can be found at Hirano, Imbens and Ridder (2003) or in Firpo (2007). The non-parametric feature of this estimation procedure comes from the fact that such approximation is refined as the sample size increases, that is, J will be a function of the sample size N, J = J (N ) → +∞ as N → +∞.

Asymptotic Distribution
We now show first that the plug-in estimators ν are asymptotically normal and compute their asymptotic variances. We then do the same for the density estimators.

The Asymptotic Distribution of Plug-in Estimators
We start assuming that the estimators ν are asymptotically linear in the following sense: Assumption 4 (Asymptotic Linearity) ν t and ν C are asymptotically linear, that is, Assumption 4 establishes that estimators are either exactly linear, as those that are based on sample moments, or they can be linearized and the remainder term will approach zero as the sample size increases. An additional technical assumption is that the influence function are square integrable and its conditional expectation given X differentiable. To simplify notation, let us write

Assumption 5 [Influence Function]
For all weighting functions ω considered, and are continuously differentiable for all x in X .
Under ignorability both types of estimators (parametric and non-parametric first step) for ν 1 , ν 0 , and ν C proposed before will remain asymptotically linear. The theorem below consider both the parametric and non-parametric two cases.

Proofs
Proof of Theorem 1: A proof can be found in Firpo (2007b).