Next Article in Journal / Special Issue
The Wall’s Impact in the Occupied West Bank: A Bayesian Approach to Poverty Dynamics Using Repeated Cross-Sections
Previous Article in Journal
Structural Break Tests Robust to Regression Misspecification
Previous Article in Special Issue
A Hybrid MCMC Sampler for Unconditional Quantile Based on Influence Function
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Decomposing Wage Distributions Using Recentered Influence Function Regressions

1
Insper Institute of Education and Research, R. Quatá, 300, São Paulo–SP 04546-042, Brazil
2
Vancouver School of Economics, University of British Columbia, 6000 Iona Drive, Vancouver, BC V6T 1L4, Canada
*
Author to whom correspondence should be addressed.
Econometrics 2018, 6(2), 28; https://doi.org/10.3390/econometrics6020028
Submission received: 31 December 2017 / Revised: 27 April 2018 / Accepted: 9 May 2018 / Published: 25 May 2018
(This article belongs to the Special Issue Econometrics and Income Inequality)

Abstract

:
This paper provides a detailed exposition of an extension of the Oaxaca-Blinder decomposition method that can be applied to various distributional measures. The two-stage procedure first divides distributional changes into a wage structure effect and a composition effect using a reweighting method. Second, the two components are further divided into the contribution of each explanatory variable using recentered influence function (RIF) regressions. We illustrate the practical aspects of the procedure by analyzing how the polarization of U.S. male wages between the late 1980s and the mid 2010s was affected by factors such as de-unionization, education, occupations, and industry changes.
JEL Classification:
C18; J31

1. Introduction

The ongoing growth in wage inequality in the United States and several other countries over the past thirty-five years has generated a resurgence of interest for distributional issues and methods to analyze these issues. There is also a sizeable literature looking at wages differentials between subgroups that goes beyond simple mean comparisons. More generally, there is increasing interest in distributional impacts of various programs or interventions. In all these cases, the key question of economic interest is which factors account for changes (or differences) in distributions. For example, did wage inequality increase because education or other wage setting factors became more unequally distributed, or because the return to these factors changed over time?
In response to these important questions, several decomposition procedures have been suggested to untangle the sources of changes or differences in wage distributions. In Fortin et al. (2011), we reviewed the traditional Oaxaca-Blinder (OB) decomposition method and several of its extensions in the context of the treatment effect literature to highlight the advantages and disadvantages of different methodologies. The goal of the current paper is to provide a detailed and updated exposition of an extension to the OB decomposition that relies on recentered influence function (RIF) regressions (Firpo et al. 2009) [FFL, thereafter] to estimate the effect of covariates on inequality measures, such as percentile differences and ratios, the variance of log wages, or the Gini coefficient.1 Relative to several procedures proposed recently (Machado and Mata 2005; Melly 2005; Chernozhukov et al. 2013) [CFM, thereafter], this method has the advantage of allowing general distributional measures to be decomposed non-sequentially in the same way means can be decomposed using the conventional OB method. The methodology has been applied in a number of different settings where the object of interest is the unconditional distribution of outcomes.2
As is well known, the OB procedure provides a way of: (1) decomposing changes or differences in mean wages into a wage structure effect and a composition effect; and (2) further dividing these two components into the contribution of each covariate. The main problem with sequential decomposition methods is that they cannot be used to divide the composition effect into the role of each covariate in a way that is independent of the order of the decomposition. Thus, while it is natural to ask to what extent changes in the distribution of education have contributed to the growth in wage inequality, this particular question has not been answered in the literature for lack of available decomposition methods. In contrast, this question is straightforward to answer in the case of the mean using a OB decomposition.
In this paper, we focus on a two-stage procedure that can be used to perform OB type decompositions on any distributional measure, and not only the mean. The first stage consists of decomposing the distributional statistic of interest into a wage structure and a composition component using a reweighting approach, where the weights are either parametrically or non-parametrically estimated. As in the related program evaluation literature, we show that ignorability and common support are key assumptions required to identify separately the wage structure and composition effects. Provided that these assumptions are satisfied, the underlying wage setting model can be as general as possible. The idea of the first stage is thus very similar to DiNardo et al. (1996). Here, we clarify the assumptions required for the identification of distributional statistics besides the mean by drawing a parallel with the program evaluation (treatment effect) literature.
In the second stage, we further divide the wage structure and composition effects into the contribution of each covariate, just as in the usual OB decomposition. This is done using the regression-based method proposed by FFL to estimate the effect of changes in covariates on any distributional statistics such as inter-quartile ranges, the variance, or the Gini coefficient.
The method developed in FFL replaces the dependent variable of a regression by the corresponding recentered influence function (RIF) for the distributional statistics of interest. The influence function, also known as Gâteaux (1913) derivative, is a widely used concept in robust statistics and is easy to compute. Using the fact that the expected value of the influence function is equal to zero and the law of iterated expectations, we can express the distributional statistic of interest as the average of the conditional expectation of the RIF given the covariates. As in FFL, we call these conditional expectations RIF-regressions.
Average derivatives computed using the RIF-regressions yield the partial effect of a small location shift in the distribution of covariates on the distributional statistic of interest. FFL call this parameter Unconditional Partial Effect (UPE), which for the special case of quantiles become the Unconditional Quantile Partial Effect (UQPE). By approximating the conditional expectations by linear functions, the coefficients of these RIF-regressions indicate by how much the functional (e.g., the quantile) of the marginal outcome distribution is affected by an infinitesimal shift to the right in the distribution of the regressors.
Because the UPE parameter corresponds to the effect of infinitesimal shift in the distribution of regressors, it approximates well small changes in that distribution, but not necessarily large changes. For known changes in the distribution of covariates (e.g., between two time periods), one can easily compute the associated total change in the functional of the outcome distribution of interest. Rothe (2012) proposes statistical inference for that case.3 Both Rothe (2012) and CFM compute the conditional CDF (cumulative distribution function) of the outcome given covariates in the first step. This adds a computationally intensive layer of estimation, since one needs to calculate the entire conditional CDF, even if only interested in one single quantile of the marginal outcome distribution. By contrast, our approach requires only one OLS regression, which is very attractive from a computational standpoint. Finally, even though we end up performing bootstrap-based inference in our empirical application, we show in the Appendix B that the analytical formulas for the standard errors of the reweighting estimates can be derived.
The main advantage of using the RIF-regression method in a Oaxaca-Blinder type decomposition is that it provides a linear approximation of highly non-linear functionals, such as the quantiles or the Gini coefficient. Nevertheless, its simplicity comes at a cost. As pointed out by Rothe (2015), the impact of changes in the distribution of covariates on some non-linear functionals may be poorly approximated by RIF-regressions. Thus, approximation errors are a by-product of the method and they should always be reported in the decomposition results, as we do in our empirical analysis below.
We illustrate how our procedure works in practice by looking at changes in the distribution of male wages in the United States between the late 1980s and the mid 2010s. This period is quite interesting from a distributional point of view as inequality increased in the top end of the wage distribution, but decreased in the low end of the distribution, a phenomenon that Autor et al. (2006) referred to as the polarization of the U.S. labor market. We use our method to investigate the source of change in the wage distribution by decomposing the changes at various wage quantiles. The results indicate that no single factor appears to be able to fully explain the polarization of the wage distribution. De-unionization accounts for some of the decreasing wage inequality at the low end and increasing inequality at the top end. The continuing growth in returns to education, especially at a level above high school, is the most important source of growth in top-end inequality. Changes in the occupational structure of the workforce helps account for the polarization of wages, but these wage changes are mostly offset by changes in the effect of industry at the upper end of the distribution. This explains why, despite convincing evidence that the “routinization of jobs” had substantial impact of the polarization of employment, its effects of wage polarization has been more difficult to identify directly (e.g., Autor and Dorn 2013). Our results suggest that the wage decline in “routine occupations” (Autor et al. 2003), such as production jobs in the manufacturing sector, has been compensated by increases in the primary sector (e.g., mining, oil and gas, etc.), the distribution sector (transportation and wholesale) and in the services sector. Potentially offsetting effects underline the need for the proposed approach that can “run horse races" between different sets of factors. However, increases at the lower end appear to be attributable to changes in minimum wages, which we do not model here.4
The remainder of the paper is organized as follows. Section 2 discusses the decomposition problem and reviews the strengths and weaknesses of existing procedures. The identification of the proposed decomposition procedure is presented in Section 3. Section 4 discusses estimation and inference, and illustrates how the decomposition methodology works in the case of quantiles, the variance, and the Gini coefficient. Section 5 provides an empirical application of the methodology to the changes in the distribution of male wages in the United States between the late 1980s and the mid 2000s.

2. The Decomposition Problem and Shortcomings of Existing Methods

Before presenting our method in detail, it is useful to first review the case of the mean for which the standard OB method is very well known. To simplify the exposition, we will work with the case where the outcome variable, Y, is the wage, although our approach can be used for any other outcome variable. The OB method can be used to divide a difference in mean wages between two groups, or overall mean wage gap, into a composition effect linked to differences in covariates between the two groups, and a wage structure effect linked to differences in the return to these covariates between the two groups. The two groups are labeled as t = 0 , 1 . In the original papers by Oaxaca (1973) and Blinder (1973), the two groups used were either men and women, or blacks and whites. More generally, the two groups can be a control and a treatment group, or similar groups of individuals at two points in time, as in the wage inequality literature.
We first review how the OB decomposition provides a straightforward way of dividing up the contribution of each covariate in a composition and a wage structure effect. Focusing on differences in the wage distributions of two groups, 1 and 0, for a worker i, let Y 1 i be the wage that would be paid in Group 1, and Y 0 i the wage that would be paid in Group 0. Since a given individual i is only observed in one of the two groups, we either observe Y 1 i or Y 0 i , but never both. Therefore, for each i, we can define the observed wage, Y i , as Y i = Y 1 i · T i + Y 0 i · 1 T i , where T i = 1 if individual i is observed in Group 1, and T i = 0 if individual i is observed in group 0. There is also a vector of covariates X X R K that we can observe in both groups.
In the standard OB decomposition, one assumes a linear functional form. In other words, one writes
Y t i = X i β t + ε t i , for t = 0 , 1 ,
where E [ ε t i | X i , T = t ] = 0 .
Define the overall mean wage gap as Δ O μ = E [ Y | T = 1 ] E [ Y | T = 0 ] , and consider dividing the overall mean gap into a wage structure effect and a composition effect. Averaging over X, the mean wage gap Δ O μ can be written as
Δ O μ = E [ Y | T = 1 ] E [ Y | T = 0 ] = E [ E ( Y | X , T = 1 ) | T = 1 ] E [ E ( Y | X , T = 0 ) | T = 0 ] = E X | T = 1 β 1 + E ε 1 | T = 1 E X | T = 0 β 0 + E ε 0 | T = 0 ,
where E ε t | T = t = 0 because E ε t | X , T = t = 0 , so the expression reduces to Δ O μ = E X | T = 1 β 1 E X | T = 0 β 0 . Thus, by adding and subtracting E X | T = 1 β 0 we get
Δ O μ = E X | T = 1 β 1 β 0 Δ S , O B μ + E X | T = 1 E X | T = 0 β 0 . Δ X , O B μ
The first term in the equation is the wage structure effect, Δ S , O B μ , while the second term is the composition effect, Δ X , O B μ . Note that the reference group used to compute the wage structure effect here is the Group 0, though the decomposition could also be performed using Group 1 instead as the reference group. The wage structure and composition effects can also be written in terms of sums over the explanatory variables
Δ S , O B μ = k = 1 K E X k | T = 1 ( β 1 , k β 0 , k ) , Δ X , O B μ = k = 1 K E X k | T = 1 E X k | T = 0 β 0 , k ,
where X k and β t , k represent the kth element of X and β t , respectively. This provides a simple way of dividing Δ S , O B μ and Δ X , O B μ into the contribution of a single covariate or a group of covariates as needed.
Because of the linearity assumption, the OB decomposition is very easy to compute in practice. It can be estimated by replacing the parameter vectors β t by their OLS estimates, and replacing the expected value of the covariates E X | T = t by the sample averages.
There are nonetheless some important limitations to the standard OB decomposition. A well-known difficulty discussed by Oaxaca and Ransom (1999) and Gardeazabal and Ugidos (2004) is that the contribution of each covariate to the wage structure effect, E X k | T = 1 β 1 , k β 0 , k , is sensitive to the choice of the base group.5
A second limitation discussed by Barsky et al. (2002) is that the OB decomposition provides consistent estimates of the wage structure and composition effect only under the assumption that the conditional expectation is linear.6 One possible solution to the problem is to estimate the conditional expectation using non-parametric methods. Another solution proposed by Barsky et al. (2002) is to use a (non-parametric) reweighting approach as in DiNardo et al. (1996) to perform the decomposition.7 The advantage of this solution is that it can be applied to more general distributional statistics. The disadvantage of both solutions, however, is that they do not provide direct ways, in general, of further dividing the contribution of each covariate to the wage structure and composition effects.8
Currently available methods, such as DiNardo et al. (1996), can be used to compute the overall wage structure and composition effects for various distributional statistics. We build on this in the current paper by suggesting to estimate these two overall effects using a reweighting procedure. Available methods are much more limited, however, when it comes to further dividing the wage structure and, especially, the composition effect into the contribution each covariate. The main contribution of the paper is to explain how a simple regression-based procedure to remedy this shortcoming building on recent work by FFL.

3. Identification of General Composition and Structure Effects

3.1. Wage Structure and Composition Effects

Following the treatment effect literature (Rosenbaum and Rubin 1983, Heckman 1990, Heckman and Robb 1985, 1986), we focus on differences in the wage distributions between two groups, 1 and 0. Suppose we could observe a random sample of N = N 1 + N 0 individuals, where N 1 and N 0 are the number of individuals in each group and we index individuals by i = 1 , , N . We define the probability that an individual i is in Group 1 as p, whereas the conditional probability that an individual i is in Group 1 given X = x , is p ( x ) = Pr [ T = 1 | X = x ] , sometimes simply called the propensity score.
Wage determination depends on some observed components X i and on some unobserved components ε i R m through the wage structure functions
Y t i = g t ( X i , ε i ) , for t = 0 , 1
where g t ( · , · ) are unknown real-valued mappings: g t : X × R m R + { 0 } . As we are not imposing any distribution assumption or specific functional form, writing Y 1 and Y 0 in this way does not restrict the analysis in any sense. We will however assume that ( T , X , ε ) , or equivalently ( Y , T , X ) , have an unknown joint distribution but that is far from being restrictive.
From observed data on ( Y , T , X ) , we can non-parametrically identify the distributions of Y 1 | T = 1 d F 1 and Y 0 | T = 0 d F 0 . Without further assumptions, however, we cannot identify the counterfactual distribution of Y 0 | T = 1 d F C . The counterfactual distribution F C is the one that would have prevailed under the wage structure of Group 0, but with the distribution of observed and unobserved characteristics of Group 1. For the sake of completeness, we consider also the conditional distributions Y 1 | X , T = 1 d F 1 | X , Y 0 | X , T = 0 d F 0 | X and Y 0 | X , T = 1 d F C | X .
We typically analyze the difference in wage distributions between Groups 1 and 0 by looking at some functionals of these distributions. Let ν be a functional of the conditional joint distribution of Y 1 , Y 0 | T , that is ν : F R , and F is a class of distribution functions such that F F if ν F < + . The difference in the ν s between the two groups is called here the ν -overall wage gap, which is basically the difference in wages measured in terms of the distributional statistic ν :9
Δ O ν = ν F 1 ν F 0 = ν 1 ν 0 .
We can use the fact that the distribution of X is not the same across groups to decompose Equation (2) into two parts:
Δ O ν = ν 1 ν C + ν C ν 0 = Δ S ν + Δ X ν
where the second term Δ X ν reflects the effect of differences in the distribution of X.
The first term of the sum, Δ S ν , will reflect changes in the g t ( · , · ) functions only if we are able to fix the distribution of observables and unobservables as the one prevailing for Group 1, that is, the distribution of ( X , ε ) | T = 1 . For that to be true, ν C will be a functional evaluated at that distribution. This holds under the following assumptions: Ignorability and Overlapping Support.
The Ignorability Assumption has become popular in empirical research following a series of papers by Rubin and coauthors and by Heckman and coauthors.10 In the program evaluation literature, this assumption is sometimes called unconfoundedness and allows identification of the treatment effect on the treated sub-population.
Assumption 1.
Ignorability : Let ( T , X , ε ) have a joint distribution. For all x in X : ε is independent of T given X = x .
The Ignorability assumption should be analyzed in a case-by-case situation, as it is more plausible in some cases than in others. In our case, it states that the distribution of the unobserved explanatory factors in the wage determination is the same across Groups 1 and 0, once we condition on a vector of observed components.11 Now, consider the following assumption about the support of the covariates distribution:
Assumption 2.
Overlapping Support : For all x in X , p ( x ) = Pr [ T = 1 | X = x ] < 1 . Furthermore, Pr [ T = 1 ] > 0 .
The Overlapping Support assumption requires that there be an overlap in observable characteristics across groups, in the sense that there is no value of x in X such that it is only observed among individuals in Group 1.12 Under these two assumptions, we are able to identify the parameters of the counterfactual distribution of Y 0 | T = 1 d F C . To see how the identification result works, let us define first three relevant weighting functions:
ω 1 ( T ) T p ω 0 ( T ) 1 T 1 p ω C ( T , X ) p ( X ) 1 p ( X ) · 1 T p .
The first two reweighting functions transform features of the marginal distribution of Y into features of the conditional distribution of Y 1 given T = 1 , and of Y 0 given T = 0 . The third reweighting function transforms features of the marginal distribution of Y into features of the counterfactual distribution of Y 0 given T = 1 . We are now able to state our first identification result:13
Result 1.
Inverse Probability Weighting :
Under Assumptions 1 and 2:
(i)
F t y = E ω t ( T ) · 1 I { Y y } t = 0 , 1
( i i )
F C y = E ω C ( T , X ) · 1 I { Y y }
Identification of F C implies identification of ν F C and therefore of Δ S ν and Δ X ν . Furthermore, because of the ignorability assumption, we know that differences between the conditional distributions of X , ε | T = 1 and X , ε | T = 0 correspond only to differences in the conditional distributions F X | T = 1 and F X | T = 0 . Thus, Δ X ν will only reflect changes in distribution of X. We state these results more precisely below.
Result 2.
Identification of Wage Structure and Composition Effects :
Under Assumptions 1 and 2:
(i) Δ S ν , Δ X ν are identifiable from data on ( Y , T , X );
( i i ) if g 1 · , · = g 0 · , · then Δ S ν = 0 ;14
( i i i ) if F X | T = 1 = F X | T = 0 , then Δ X ν = 0
In Result 2, the identification of Δ S ν and Δ X ν follows from the fact that these quantities can be expressed as functionals of the distributions obtained by weighting the observations with the inverse probabilities of belonging to Group 0 or 1 given T, as stated in Result 1. Note that the non-parametric identification of either the wage determination functions g 1 ( · , · ) and g 0 ( · , · ) , or the distribution function of ε are not necessary for the effects Δ S ν and Δ X ν to be identified. Therefore, methods based on conditional mean restrictions (the OB decomposition approach) and methods based on conditional quantile restrictions (the Machado and Mata (2005) approach) are based on too strong identification conditions that can be easily relaxed if we are simply interested in the terms Δ S ν and Δ X ν .
Part ( i i ) of Result 2 also states that, when there are no group differences in the wage determination functions, then we should find no wage structure effects. Part ( i i i ) states that, if there are no group differences in the distribution of the covariates, there will be no composition effects.
Finally, it is interesting to relate these general results to the OB decomposition. Given the functional form assumptions of OB, the conditional mean zero expectation of ε and ignorability assumption, it follows that E X | T = 1 β 0 equals μ C , the counterfactual mean or the expectation of Y 0 given T = 1 :
μ C = E [ Y 0 | T = 1 ] = E [ g 0 ( X , ε ) | T = 1 ] = E [ E ( g 0 ( X , ε ) | X , T = 1 ) | T = 1 ] = E [ E ( g 0 ( X , ε ) | X , T = 0 ) | T = 1 ] = E [ X | T = 1 ] β 0 + E [ E ( ε 0 | X , T = 0 ) | T = 1 ] = E [ X | T = 1 ] β 0
In the following subsection, we show how one can generalize other features of the OB decomposition using a regression based approach, the RIF Regression.

3.2. The RIF Regressions

One important goal of the desired approach, as discussed in Section 2, is to apportion the wage structure and composition effects into the contribution of each individual covariate. To do so, we use the method proposed by FFL to compute partial effects of changes in distribution of covariates on a given functional of the distribution of Y t | T . The method works by providing a linear approximation to a non-linear functional of the distribution. Thus, through collecting the leading term of a von Mises (1947) expansion, FFL approximate those non-linear functionals by expectations, which are linear functionals or statistics of the distribution. Finally, that linearization method allows one to apply the law of iterated expectations to the distributional statistics of interest and thus to compute approximate partial effects of changes in the distribution of each single covariate on the functional of interest.
The details of the method are summarized as follows. Consider again a general functional ν = ν F . Recall the definition of the influence function (Hampel 1974), IF , introduced as a measure of robustness of ν to outlier data when F is replaced by the empirical distribution: IF ( y ; ν , F ) = lim ϵ 0 ν ( F ϵ ) ν ( F ) / ϵ , where F ϵ ( y ) = ( 1 ϵ ) F + ϵ δ y , 0 ϵ 1 and where δ y is a distribution that only puts mass at the value y. It can be shown that, by definition, IF ( y ; ν , F ) d F ( y ) = 0 .
We use a recentered version of the influence function RIF ( y ; ν , F ) = ν ( F ) + IF ( y ; ν , F ) that has an expectation equal to the original ν :
RIF ( y ; ν , F ) · d F y = ν ( F ) + IF ( y ; ν , F ) · d F y = ν ( F ) .
Letting ν t = ν ( F t ) and ν C = ν ( F C ) , we can therefore write the distributional statistics ν 1 , ν 0 , and ν C as the expectations: ν t = E RIF ( Y t ; ν , F t ) | T = t , t = 0 , 1 and ν C = E RIF ( Y 0 ; ν , F C ) | T = 1 . Using the law of iterated expectations, the distributional statistics can also be expressed in terms of expectations of the conditional recentered influence functions
ν ( F ) = E RIF ( Y ; ν , F ) | X = x · d F X ( x ) .
Letting the so-called RIF-regressions be written as m t ν x E RIF ( Y t ; ν t , F t ) | X , T = t , for t = 0 , 1 , and m C ν x E RIF ( Y 0 ; ν C , F C ) | X , T = 1 , we have
ν t = E m t ν X | T = t , t = 0 , 1 and ν C = E m C ν X | T = 1 .
It follows that Δ S ν and Δ X ν can be rewritten as:
Δ S ν = E m 1 ν X | T = 1 E m C ν X | T = 1 , Δ X ν = E m C ν X | T = 1 E m 0 ν X | T = 0 .
As is well known, in the case of the mean, the influence function at point y is its deviation from the mean and, therefore, the recentered influence function of the mean is simply the point y itself
IF ( y ; μ t , F t ) = lim ϵ 0 1 ϵ · μ t + ϵ · y μ t ϵ = y μ t ,
RIF ( y ; μ t , F t ) = IF ( y ; μ t , F t ) + μ t = y .
As a result, the RIF-regression coefficients in the case of the mean are identical to standard regression coefficients of Y on X used in the OB decomposition ( β t above), and we have
γ t μ = E [ ω t ( T ) X X ] 1 · E [ ω t ( T ) X Y ] , t = 0 , 1 γ C μ = E [ ω C ( T , X ) X X ] 1 · E [ ω C ( T , X ) X Y ] ,
where γ t μ = β t , and
Δ S μ = E X , T = 1 · γ 1 μ γ C μ ,
Δ X μ = E X | T = 1 E X | T = 0 · γ 0 μ + R μ ,
where R μ is an approximation error. When the linearity and zero conditional mean assumption of the OB decomposition are satisfied, it follows that γ C μ = γ 0 μ and R μ = 0 , as seen in the end of the previous subsection. Our decomposition is then identical to the OB decomposition. However, when these conditions are not satisfied the two decompositions are different.
In general, there is no particular reason to expect the conditional expectations m t ν X and m C ν X to be linear in X. As a matter of convenience and comparability with OB decompositions, it is nonetheless useful to consider the case of the linear specification. To be more precise, consider the linear projections (indexed by L) m L ν x
m t , L ν x = x γ t ν and m C , L ν x = x γ C ν ,
where
γ t ν = E X X | T = t 1 · E RIF ( Y t ; ν t , F t ) X | T = t , t = 0 , 1 , γ C ν = E X X | T = 1 1 · E RIF ( Y 0 ; ν C , F C ) X | T = 1 .
As is well known, even though linear projections are only an approximation for the true conditional expectation, the expected approximation error is zero, so that:
E m t , L ν X | T = t = E m t ν X | T = t t = 0 , 1 and E m C , L ν X | T = 1 = E m C ν X | T = 1 .
We can thus rewrite Δ S ν and Δ X ν as:
Δ S ν = E X | T = 1 γ 1 ν γ C ν ,
Δ X ν = E X | T = 1 γ C ν E X | T = 0 γ 0 ν ,
which generalizes the OB decomposition to any distributional statistic through the projection of its recentered influence function onto the covariates. Note that, under an additional assumption that m t , L ν · = m t ν · and m C , L ν · = m C ν · , that is, if the conditional expectation is indeed linear in x, then γ 0 ν = γ C ν . In the case of the mean ( ν = μ ), it then follows that the equations above reproduce exactly the OB decomposition.
It is important to note that the case of the mean is quite unique because the recentered influence function does not depend on the distribution F, i.e., RIF ( y ; μ , F ) = IF ( y ; μ , F ) + μ = y . The lack of dependence on F is due to the fact that the influence function is a linear approximation that is exact in the case of the mean. For other distributional statistics, the approximation (or specification) error R is due to two separate factors. First, as in the case of the mean the conditional expectation of RIF ( y ; ν , F ) given X may not be linear in X. Second, both the RIF and the projection coefficients γ depend on the distribution F. Thus, for more general distributional statistics, γ 0 ν = γ C ν will not generally hold regardless of whether the conditional expectation is linear or not. As a result, we should expect to have a non-zero approximation error (see Equation (12)) for distributional statistics besides the mean, although how large the error is remains an empirical question.

3.3. Interpreting the Decomposition

We have just shown that, under a linearity assumption, the decomposition based on RIF-regressions is similar to a standard OB decomposition. We now go beyond this simple analogy to define more explicitly what we mean by the contribution of each single covariate to the wage structure and composition effects.

3.3.1. Composition Effects

FFL show that RIF-regression estimates can either be used to estimate the effect of a “small change” of the distribution of X on ν , or to provide a first-order approximation of a larger change of the distribution of X on ν . The latter effect, that FFL call a “policy effect” , is what concerns us here. In fact, the composition effect Δ X ν exactly corresponds to FFL’s policy effect, where the “ policy” consists of changing the distribution of X from its value at T = 0 to its value at T = 1 (holding the wage structure constant).
For the sake of simplicity, we continue to work with the linear specification introduced in Section 3.2. As it turns out, FFL show that, in the case of quantiles, using a linear specification for RIF-regressions generally yields very similar estimates to more flexible methods allowing for non-linearities.15 We nonetheless discuss below the consequences of the linearity assumption for the interpretation of the results.
An explicit link with the results of FFL concerning policy effects is obtained by rewriting the composition effects as
Δ X ν = E X | T = 1 E X | T = 0 γ 0 ν + R ν .
where R ν = E X | T = 1 γ C ν γ 0 ν . The first term in Equation (12) is now similar to the standard OB type composition effect, and can be rewritten in terms of the contribution of each covariate as
k = 1 K E X k | T = 1 E X k | T = 0 γ 0 , k ν .
Each component of this equation can be interpreted as the “ policy effect” of changing the distribution of one covariate from its T = 0 to T = 1 level, holding the distribution of the other covariates unchanged.
As discussed earlier, the second term in Equation (12), R ν , is the approximation error linked to the fact that FFL’s regression-based procedure only provides a first-order approximation to the composition effect Δ X ν . In practice, it can be estimated as the difference between the reweighting estimate of the composition effect, ν C ν 0 , and the estimate of ( E X | T = 1 E X | T = 0 ) γ 0 ν obtained using the RIF-regression approach. When the latter approach provides an accurate (first-order) approximation of the composition effect, the error should be small. Looking at the magnitude of the error thus provides a specification test of FFL’s regression-based procedure.
Note that using a linear specification for the RIF-regression instead of a general function m ν X = E RIF ( Y ; ν t , F t ) | X simply changes the interpretation of the specification error R ν by adding an error component linked to the fact that a potentially incorrect specification may be used for the RIF-regression. We nonetheless suggest using the linear specification in practice for three reasons. First, we get an approximation error anyway since FFL’s procedure only gives a first-order approximation to the impact of “large” changes in the distribution of X. Second, the linear specification does not affect the overall estimates of the wage structure and composition effects that are obtained using the reweighting procedure. Third, using a linear specification has the advantage of providing a much simpler interpretation of the decomposition, as in the OB decomposition. Our suggestion is thus to use the linear specification but also look at the size of the specification error to make sure that the FFL approach provides an accurate enough approximation for the problem at hand.16

3.3.2. Wage Structure Effect

The wage structure effect in Equation (10), Δ S ν = E X | T = 1 γ 1 ν γ C ν , already looks very much like the usual wage structure effect in a standard OB decomposition. One important difference relative to the OB decomposition is that the coefficient γ C ν (the regression coefficient when the Group 0 data are reweighted to have the same distribution of X as Group 1) is used instead of γ 0 ν (the unadjusted regression coefficient for Group 0). The reason for using γ C ν instead of γ 0 ν is that the difference γ 1 ν γ C ν solely reflects differences between the wage structures g 1 ( · ) and g 0 ( · ) , while the difference γ 1 ν γ 0 ν may be contaminated by differences in the distribution of X between the two groups.
In conventional regression analysis, the main reason why OLS estimates may depend on the distribution of X is that, when the conditional expectation of Y given X is non-linear, OLS minimizes a specification error that itself depends on the distribution of X (White 1980). An additional issue in our context is that for distribution statistics besides the mean, the recentered influence function RIF ( Y ; ν , F ) depends on the distribution of Y (F). Changing the distribution of X changes the distribution of Y and, thus, the value of RIF ( Y ; ν , F ) for a given value of Y. This also affects the coefficients in a regression of RIF ( Y ; ν , F ) on X since we are no longer using the same RIF on the left hand side of the regression. As just discussed, this important problem can be addressed by estimating γ C ν in the reweighted sample, which insures that the difference γ 1 ν γ C ν only reflects differences between the wage structures g 1 ( · ) and g 0 ( · ) .
Another limitation of OB decompositions that also applies here is that the contribution of each covariate to the wage structure effect is sensitive to the choice of a base group. There is, unfortunately, no simple solution to this problem.17 To see this, rewrite the wage structure effect
Δ S ν = ν 1 ν C = ν 1 ν B 1 ν C ν B C + ν B 1 ν B C ,
where ν B 1 is the distributional statistic in an arbitrary “base group” under the wage structure g 1 ( · , · ) , while ν B C is the distributional statistic for the same base group under the wage structure g 0 ( · , · ) . The term ν 1 ν B 1 represents the “policy effect” of changing the distribution of X from its value in the base group to its T = 1 value under the wage structure g 1 ( · , · ) , while ν C ν B C represents the corresponding policy effect under the wage structure g 0 ( · , · ) . Since there is no dispersion in X in a base group of workers with similar characteristics, switching to the actual distribution of X will typically result in more wage dispersion. The overall wage structure effect is, thus, equal to the difference in the dispersion enhancing effect under g 1 ( · , · ) and g 0 ( · , · ) , respectively, plus a “residual” difference in the distributional statistic in the base group, ν B 1 ν B C . Unless this residual change is invariant to the choice of the base group, the contribution of each covariate to the wage structure will be sensitive to the choice of base group.

4. Estimation and Inference

In this section, we discuss how to estimate the different elements of the decomposition introduced in the previous section: ν 1 , ν 0 , ν C , γ 1 , γ 0 and γ C . For ν 1 , ν 0 , γ 1 and γ 0 , the estimation is very standard because the distributions F 1 , and F 0 , are directly identified from data on ( Y , T , X ). The distributional statistic ν 1 , ν 0 can be estimated as their sample analogs in the data, while γ 1 and γ 0 can be estimated using standard least square methods. In contrast, the estimation of ν C and γ C requires first estimating the weighting function ω C ( T , X ) . We present two common methods—parametric and non-parametric—to estimate ω C ( T , X ) .
We discuss separately the estimation of the first and second stages of the decomposition. The first stage relies on a reweighting procedure, while the second stage is based on the estimation of RIF-regressions. We only present the general lines of the estimation procedure in this section. Proofs and details about the parametric and non-parametric procedure to estimate ω C ( T , X ) , and the asymptotic behavior of these estimators are discussed in the Appendix B and in Firpo and Pinto (2016). Finally, we show how the estimation procedure can be applied to the specific cases of the quantiles, interquantile ranges, variance and the Gini coefficient.

4.1. First Stage Estimation

The first step of the estimation procedure consists of estimating the weighting functions ω 1 ( T ) , ω 0 ( T ) and ω C ( T , X ) . Then, the distributional statistics ν 1 , ν 0 , ν C are computed directly from the appropriately reweighted samples. Details of the estimation procedure are presented in the Appendix B and in Firpo and Pinto (2016).

4.2. Second Stage Estimation

Now, consider estimation of the regression coefficients γ 1 ν , γ 0 ν , and γ C ν :
γ ^ t ν = i = 1 N ω ^ t ( T i ) X i X i 1 · i = 1 N ω ^ t ( T i ) RIF ^ ( Y i ; ν t , F t ) X i , t = 0 , 1 γ ^ C ν = i = 1 N ω ^ C ( T i , X i ) X i X i 1 · i = 1 N ω ^ C ( T i , X i ) RIF ^ ( Y i ; ν C , F C ) X i
where for t = 0 , 1
RIF ^ ( y ; ν t , F t ) = ν ^ t + IF ^ ( y ; ν t , F t ) and RIF ^ ( y ; ν C , F C ) = ν ^ C + IF ^ ( y ; ν C , F C ) ,
and IF ^ ( · ; ν , F ) is a proper estimator of the influence function. We discuss how to estimate the influence function for a number of specific cases in Section 4.3.
We can thus decompose the effect of changes from T = 0 to T = 1 on the distributional statistic ν  as:
Δ ^ S ν = i = 1 N ω ^ 1 ( T i ) X i γ ^ 1 ν γ ^ C ν Δ ^ X ν = i = 1 N ω ^ 1 ( T i ) X i γ ^ C ν i = 1 N ω ^ 0 ( T i ) X i γ ^ 0 ν
It is also useful to rewrite the estimate of the composition effect as
Δ ^ X ν = i = 1 N ω ^ 1 ( T i ) ω ^ 0 ( T i ) X i γ ^ 0 ν + R ^ ν ,
where R ^ ν = i = 1 N ω ^ 1 ( T i ) X i γ ^ C ν γ ^ 0 ν is an estimate of the approximation error previously discussed. This generalizes the OB decomposition to any distributional statistic, including quantiles, the variance or the Gini coefficient.

4.3. Examples

We now turn to popular statistics, (unconditional) quantiles, the variance, and the Gini coefficient to illustrate how the different elements of the decomposition can be computed in these specific cases.

4.3.1. Quantiles and Interquantile Ranges

Quantiles are a set of distributional measures that have been used extensively for the decomposition of wage distributions. Several methodologies (Machado and Mata 2005; Melly 2005) use conditional quantiles regressions as primary tools to infer entire distributions and counterfactual distributions even when the object of interest is the unconditional quantiles. For instance, in decompositions of the gender wage gap, they are used to address issues such as glass ceilings and sticky floors.
The τ -th quantile of the distribution F is defined as the functional, Q ( F , τ ) = inf { y | F ( y ) τ } , or as q τ for short, and its influence function is:
IF ( y ; q τ , F ) = τ 1 I y q τ f Y q τ .
As shown in FFL, the recentered influence function of the τ th quantile is
RIF ( y ; q τ , F ) = q τ + IF ( y ; q τ , F ) = q τ + τ 1 I y q τ f Y q τ = c 1 , τ · 1 I y > q τ + c 2 , τ .
where c 1 , τ = 1 / f Y q τ ,   c 2 , τ = q τ c 1 , τ · 1 τ , and f Y q τ is the density of Y evaluated at q τ . Thus,
E RIF ( Y ; q τ , F ) | X = x = c 1 , τ · Pr Y > q τ | X = x + c 2 , τ .
and the estimation of conditional mean of the RIF ( Y ; q τ , F ) can be seen more intuitively as the estimation of a conditional probability model of being below or above the quantile of interest q τ , rescaled by a factor c 1 , τ to reflect the relative importance of the quantile to the distribution, and recentered by a constant c 2 , τ .
The decomposition of (unconditional) quantiles proceeds along the same steps as in the case of the mean. In the first stage, the estimates of q τ t , t = 0 , 1 and q τ C are obtained by reweighting as q ^ τ t = arg min q i = 1 N ω ^ t ( T i ) · ρ τ ( Y i q ) , t = 0 , 1 , and q ^ τ C = arg min q i = 1 N ω ^ C ( T i , X i ) · ρ τ ( Y i q ) . The function ρ τ ( · ) is the well known check function, proposed by Koenker and Bassett (1978), where, for any u in R , ρ τ ( u ) = u · ( τ 1 { u 0 } ) . Note that q ^ τ t and q ^ τ C can simply be computed using standard software packages with the appropriate weighting factor.
The estimators for the gaps are computed as:
Δ ^ O q τ = q τ ^ 1 q τ ^ 0 ; Δ ^ S q τ = q τ ^ 1 q τ ^ C and Δ ^ X q τ = q τ ^ C q τ ^ 0 .
In the second stage, we estimate the linear RIF-regressions. First, the recentered influence function is computed for each observation by plugging the sample estimate of the quantile, q τ ^ , and estimating the density at the sample quantile, f ^ ( q τ ^ ) .
For the τ quantile of Y 1 | T = 1 , we would use RIF ^ ( y ; q τ 1 , F ) = q τ 1 ^ + f 1 ^ q τ ^ 1 1 · ( τ 1 I { y q τ , 1 ^ } ) where f 1 ^ · is a consistent estimator for the density of Y 1 | T = 1 , f 1 · . For example, kernel methods can be used to estimate the density, but other simpler alternative methods are also available. For example, one may dispense with estimation of the density by kernel by noticing that c 1 , τ = d q τ / d τ . By estimating sufficiently close quantiles, say q τ and q τ + λ , where λ is a small positive real number, an estimate of c 1 , τ is c ^ 1 , τ = ( q ^ τ + λ q ^ τ ) / λ , which is the inverse of the sparsity density estimator (Koenker 2005, p. 139). Another interesting alternative method is the recent one suggested by Cattaneo et al. (2017), which uses local polynomial regressions.
In the example of Y 1 | T = 1 , the RIF-regressions are estimated by replacing the usual dependent variable, Y, by the estimated value of RIF ^ ( y ; q τ 1 , F ) . Standard software packages can be used to do so. The resulting regression coefficients are therefore
γ ^ t q τ = i = 1 N ω ^ t ( T i ) X i X i 1 · i = 1 N ω ^ t ( T i ) X i RIF ^ ( Y i ; q τ t , F t ) , t = 0 , 1 ,
γ ^ C q τ = i = 1 N ω ^ C ( T i , X i ) X i X i 1 · i = 1 N ω ^ C ( T i , X i ) X i RIF ^ ( Y i ; q τ C , F C ) .
Similar to the case of the mean, we get:
Δ ^ S q τ = E X , T = 1 γ ^ 1 q τ γ ^ C q τ ,
Δ ^ X q τ = E X | T = 1 E X | T = 0 γ ^ 0 q τ + R ^ q τ ,
where R ^ q τ = E X | T = 1 γ ^ C q τ γ ^ 0 q τ .
Interquantile ranges, such as the difference between the 75th and the 25th percentiles, and the 90–10 gap (difference between 90th and the 10th percentiles) are also popular inequality measures that only depend on quantiles. Because they are simple differences between quantiles, their γ coefficients are the differences in the γ coefficients of their respective quantiles. For that reason, we omit the theoretical discussion about interquantile ranges, but present their estimates in the empirical section.

4.3.2. Variance

There are other applications where it is useful to decompose the impact of covariates on the variance of the distributions of log wages. Examples include the compression effect of unions and of public sector wage setting.
The estimators of these gaps can be computed as:
Δ ^ O σ 2 = σ ^ 1 2 σ ^ 0 2 ; Δ ^ S σ 2 = σ ^ 1 2 σ ^ C 2 and Δ ^ X σ 2 = σ ^ C 2 σ ^ 0 2 ,
using the reweighting scheme σ ^ t 2 = i = 1 N ω ^ t ( T i ) Y i μ ^ t 2 , t = 0 , 1 , and σ ^ C 2 = i = 1 N ω ^ C ( T i , X i ) · Y i μ ^ C 2 . The influence function of the variance is well-known to be
IF ( y ; σ 2 , F Y ) = y z · d F Y z 2 σ 2 ,
and the recentered influence function is the first term of this expression RIF ( y ; σ 2 , F Y ) = y z · d F Y z 2 = ( Y μ ) 2 .
The decomposition in terms of individual covariates, such as union coverage, follows by replacing RIF ( · ; q τ ) by RIF ( · ; σ 2 , F ) in Equations (16)–(19).

4.3.3. The Gini coefficient

Finally, another popular measure of wage inequality is the Gini coefficient. There are a few papers (Choe and Van Kerm 2014; Gradín 2016) that have begun to use RIF-Gini regressions to investigate changes in income inequality. Recall that the Gini coefficient is defined as
ν G ( F Y ) = 1 2 μ 1 R ( F Y )
where R ( F Y ) = 0 1 G L ( p ; F Y ) d p with p ( y ) = F Y ( y ) and where G L ( p ; F Y ) is the generalized Lorenz ordinate of F Y given by G L ( p ; F Y ) = F 1 ( p ) z d F Y ( z ) . The generalized Lorenz curve tracks the cumulative total of y divided by total population size against the cumulative distribution function. The generalized Lorenz ordinate can be interpreted as the proportion of earnings going to the 100p% lowest earners.
Monti (1991) derives the influence function of the Gini coefficient as
IF ( y ; ν G , F Y ) = A 2 ( F Y ) + B 2 ( F Y ) y + C 2 ( y ; F Y )
where A 2 ( F Y ) = 2 / μ 1 R ( F Y ) , B 2 ( F Y ) = 2 μ 2 R ( F Y ) , and C 2 ( y ; F Y ) = 2 / μ 1 [ y 1 p ( y ) + G L p ( y ) ; F Y with R ( F Y ) and G L ( p ( y ) ; F Y ) as defined underneath Equation (22). Recentering yields
RIF ( y ; ν G , F Y ) = 1 + B 2 ( F Y ) y + C 2 ( y ; F Y ) .
The recentered influence function of the Gini coefficient can also be written as
RIF ( y ; ν G , F Y ) = 2 y μ ν G + ( 1 y ) μ + 2 μ z F Y ( z ) d z ,
which gives a more intuitive expression after integrating by parts
RIF ( y ; ν G , F Y ) = 2 y μ F Y ( y ) ( 1 + ν G ) 2 + 2 ( 1 ν G ) 2 G L ( p ; F Y ) + ν G ,
where ( 1 + ν G ) / 2 and ( 1 ν G ) / 2 correspond, respectively, to the areas above and below the Lorenz curve. As pointed out by Monti (1991), the first term is unbounded because it increases by the factor y / μ , while the second is bounded between ν G 1 and 1 + ν G . Thus, the RIF ( y ; ν G , F Y ) is continuous and convex in y; its first derivative is equal to 2 / μ [ F Y ( y ) ( 1 + ν G ) / 2 ] , and it reaches its minimum when F Y ( y ) = ( 1 + ν G ) / 2 . The function is theoretically unbounded from above, but in practice it reaches its maximum at the upper bound of the empirical support of the distribution. This implies that the Gini coefficient is not robust to measurement error in high earnings, as pointed out by Cowell and Victoria-Feser (1996).
The GL coordinates are estimated using a series of discrete data points y 1 , y N , where observations have been ordered so that y 1 y 2 y N . Consider
p t ^ ( y i ) = j = 1 i ω ^ t ( T j ) j = 1 N ω ^ t ( T j ) , G L t ^ ( p ( y i ) ) = j = 1 i ω ^ t ( T j ) · Y j j = 1 N ω ^ t ( T j ) t = 0 , 1 p C ^ ( y i ) = j = 1 i ω ^ C ( T j , X j ) j = 1 N ω ^ C ( T j , X j ) , G L C ^ ( p ( y i ) ) = j = 1 i ω ^ C ( T j , X j ) · Y j j = 1 N ω ^ C ( T j , X j )
where the numerators are the sum of the i ordered values of Y. The R ^ ( F t ) , t = 0 , 1 and R ^ ( F C ) are obtained by numerical integration of G L t ^ ( p ( y i ) ) over p t ^ ( y i ) , and of G L C ^ ( p ( y i ) ) over p C ^ ( y i ) .18 The estimates of ν ^ G ( F t ) , t = 0 , 1 and ν ^ G ( F C ) are obtained by substituting R ^ ( F t ) and R ^ ( F C ) , as well as μ ^ t and μ ^ C , into Equation (22). We can then compute the gaps for the changes in the Gini coefficient as in Equation (20).
Similar substitutions into Equation (24) allows the estimation of RIF ^ ( y ; ν t G , F t ) , t = 0 , 1 and RIF ^ ( y ; ν C G , F C ) . As before, the decomposition in terms of individual covariates, follows by replacing RIF ^ ( · ; q τ , F ) by RIF ^ ( · ; ν G , F ) in Equations (16)–(19).

5. Empirical Application: Changes in Male Wage Inequality between 1988 and 2016

Our empirical application focuses on changes in wage inequality over the past 30 years. It is well known that wage inequality increased sharply in the United States since the beginning of the 1980s. Using various distributional methods, Juhn et al. (1993) and DiNardo et al. (1996) showed that inequality expanded all through the wage distribution during the 1980s. In particular, both the “90–50 gap” (the difference between the 90th and the 50th quantile of log wages) and the “50–10 gap” increased during this period.
Since the late 1980s, however, changes in inequality have increasingly been concentrated at the top end of the wage distribution. In fact, Autor et al. (2006) showed that, while the 90–50 gap kept expanding after the late 1980s, the 50–10 gap declined during the same period. They refer to these changes as an increased polarization of the labor market. An obvious question is why wage dispersion has changed so differently at different points of the distribution. Autor et al. (2006) suggest that technological change is a possible answer, provided that computerization resulted in a decline in the demand for skilled but “ routine” tasks that used to be performed by workers around the middle of the wage distribution.19
Lemieux (2008) reviewed possible explanations for the increased polarization in the labor market, including the technological-based explanation of Autor, Katz, and Kearney. He suggested that, if this explanation is an important one, then changes in relative wages by occupation, i.e., the contribution of occupations to the wage structure effect, should play an important role in changes in the wage distribution. Furthermore, since it is well known that education wage differentials kept expanding after the late 1980s (e.g., Acemoglu and Autor 2011), the contribution of education to the wage structure effect is another leading explanation for inequality changes over this period. More recent studies have also implicated the role of offshorability and trade (Firpo et al. 2011; Autor et al. 2014) which may be more salient at the industry level, given that some “local” industries such as the construction, distribution (wholesale trade, transportation), and personal service sectors are likely less affected by these economic forces.
Previous studies also show that composition effects played an important role in increasing wage inequality. Lemieux (2006b) showed that all the growth in residual inequality over this period is due to composition effects linked to the fact that the workforce became older and more educated, two factors associated with more wage dispersion. Furthermore, Lemieux (2008) argued that de-unionization, defined as a composition effect in this paper, still contributed to the changes in the wage distribution over this period.
These various explanations can all be understood in terms of the respective contributions of a few broad sets of factors (unions, education, experience, occupations, industries, etc.) to either wage structure or composition effects. This makes the decomposition method proposed in this paper ideally suited for estimating the contribution of each of these possible explanations to changes in the wage distribution. Unlike other procedures, our method allows us to estimate the relative contribution of each of the factors mentioned above to recent changes in the U.S. wage distribution.20
Our empirical analysis is based on data for men from the 1988–1990 and 2014–2016 Outgoing Rotation Group (ORG) Supplements of the Current Population Survey, yielding about a quarter million observations for each time period. As in Fortin and Lemieux (2016), for conciseness, we focus exclusively on men. The extent of occupational gender segregation is such that we would have to perform the analysis and choose the base group separately by gender. Increasing inequality appears to have worked through different channels and time period for men and women. Autor et al. (2015) showed that men’s employment was impacted by the automation of production activities in the manufacturing sector at the beginning of the period, while women suffered employment losses associated with the impact of computerization of information-processing tasks in non-manufacturing later in the period.
The data files were processed as in Lemieux (2006b) who provided detailed information on the relevant data issues. The wage measure used is an hourly wage measure computed by dividing earnings by hours of work for workers not paid by the hour. For workers paid by the hour, we use a direct measure of the hourly wage rate. In light of the above discussion, the key set of covariates on which we focus are education (six education groups), potential experience (nine groups), union coverage, occupation (17 categories), and industry (14 categories). We also include controls for marital status and race in all the estimated models. The sample means for all these variables are provided in Table A1.21
Before proceeding to the estimation of RIF-regressions, it is important to inspect the density of wages for unusual features that would challenge the estimation of the RIF at the quantiles of interest or the wage model that w use. Figure 1 presents kernel density estimates of male wages for 1988–1990 and 2014–2016 estimated using the Epanechnikov kernel and bandwidths of 0.06 and 0.08, respectively.22 The figure also shows the 1988–1990 density reweighted to have the same distribution of characteristics as in 2014–2016. The typical issues to look for include cliffs associated with minimum wage effects at the bottom of the distribution, peaks associated with heaping (the fact that hourly wage workers, in particular, are more likely to round their wages at next dollar amount) in the middle of the distribution, and top-coding at the top of the distribution. The impact of minimum wages is clearly seen in Figure 1 when vertical lines corresponding to the minimum and maximum of federal and state minimum wages are displayed. Because we do not model minimum wages in the current paper, the 1988–1990 density and the reweighted density are superimposed in those wage ranges, showing the wage setting variables that we include are inadequate for modeling the distribution of wages when minimum wages matter.23 Thus, we remain cautious with regards to the interpretation of any effect at the bottom of the distribution.
Heaping and top-coding can be problematic if they imply an unusually high value of the density at a particular quantile of interest that potentially biases the estimation of the denominator f Y ^ ( q ^ τ ) of the influence function (14). While only 0.7% of workers are top-coded in 1988–1990, this proportion increases to 3.6% in 2014–2016.24 A standard adjustment for top-coding consists of multiplying top-coded wages by a fixed adjustment factor. In Figure 1, we use the adjustment factor of 1.4 suggested by Lemieux (2006b). While there is no visual evidence of an impact of top-coding in 1988–1990, there is a clear spike in the 2014–2016 distribution around the point (log wage of about 4.5) where most top-coded observations lie.25 We deal with this issue using a more sophisticated stochastic imputation procedure (shown as the solid line) based on a Pareto distribution estimated using tax data from Alvaredo et al. (2013).
Given our large sample of hourly paid and salaried workers, heaping does not appear to be a serious issue in Figure 1.26 However, heaping is more visible in Figure 2, which plots the 1988–1990 and 2014–2016 densities of wages for our base group. This group of about 400 workers in each period consists of non-unionized, white, married, high school educated men with 20 to 25 years of experience, working as construction workers in the construction industry, but not in the public sector.27 The figure shows that the densities have changed very little over time, aside from different positioning of some local peaks associated with heaping.28 This group was chosen because the economic forces that impact the overall wage distribution are less likely at play among this non-unionized group of low-educated workers in non-routine manual jobs with little exposure to international trade.29

5.1. RIF-Regressions

Before showing the decomposition results, we first present some estimates from the RIF-regressions for different wage quantiles, the variance of log wages, and the Gini coefficient. From Equation (14), we compute IF ( y ; q τ , F ) for each observation using the sample estimate of q τ , and the kernel density estimate of f q τ .
The RIF-regression coefficients for the 10th, 50th, and 90th quantiles in 1988–1990 and 2014–2016, along with bootstrapped standard errors, are reported in Table 1. The RIF-regression coefficients for the variance and the Gini are reported in Table 2. Detailed estimates for each of the 19 quantiles from the 5th to the 95th are also reported in Figure 3, Figure 4 and Figure 5. For several covariates (for example, union status, non-white, married, clerical, production, and service occupations, transportation and utility, public administration sectors). Figure 3 illustrates highly non-monotonic effects across the different quantiles for some demographics. For instance, in Panel 1, the effect of union status first increases up to around the 40th quantile in 1988–1990, and up the 50th quantile in 2014–2016, and then declines, even turning negative for the 90th and 95th quantiles.
As shown by the RIF-regressions for the more global measures of inequality—the variance of log wages and the Gini coefficient of the wage distribution—displayed in Table 2, the effect of unions on these measures is negative, although the magnitude of that effect has decreased over time. This is consistent with the well-known result (e.g., Freeman 1980) that unions tend to reduce the variance of log wages for men. More importantly, as shown in Table 1, the results also indicate that unions increase inequality in the lower end of the distribution, but decrease inequality even more in the higher end of the distribution. As we will see later in the decomposition results, this means that the continuing decline in the rate of unionization can account for some of the “polarization” of the labor market (decrease in inequality at the low-end, but increase in inequality at the top end). The results for unions also illustrate an important feature of RIF regressions for quantiles, namely that they capture both the between-group effect (arising from union wage premia) and the within-group effect (arising from wage union compression) of unions on wage dispersion, which go in opposite direction in this case.30
The RIF-regression estimates in Table 1 for other covariates also illustrate this point. Consider, for instance, the case of college education. Table 1 and Figure 3 show that the effect of college increases monotonically as a function of percentiles. In other words, increasing the fraction of the workforce with a college degree has a larger impact on higher than lower quantiles. The reason why the effect is monotonic is that education increases both the level and the dispersion of wages (see, e.g., Lemieux 2006a). As a result, both the within- and the between-group effects go in the same direction of increasing inequality.
Another clear pattern that emerges in Figure 3 and Figure 4 is that for most inequality enhancing covariates, i.e., those with a positively sloped curve, the inequality enhancing effect increases over time. In particular, the slopes for high levels of education (college graduates and post-graduates) and high-wage occupations (upper management, engineers and computer scientists, doctors, and lawyers) become steeper over time. This suggests that these covariates make a positive contribution to the wage structure effect.
There are some changes in the contribution of occupations and industries that are consistent with technological change and the routine-biased polarization of wages. For example, as shown in Figure 4 and Figure 5, there are increases in the returns to high-tech service industries at the upper end of the wage distribution, but decreases in the returns to production and clerical occupations in the middle of the wage distribution. There are also decreases in the penalties to some low skilled non-routine occupations and associated industries, such as service occupations and truck driving and the retail industry, although some increases at the lower end appear to be driven by changes in minimum wages. On the other hand, there are some offsetting effects in industries that could have compensated the decline in manufacturing employment, such as the primary (e.g., mining), wholesale and retail trade, and personal services industries. In summary, the changes in the rewards and penalties associated with occupations and industries provide a descriptive account of factors potentially offsetting the wage effects of the polarization of employment. We turn next to the evaluation of the magnitude of these effects.

5.2. Decomposition Results

The results for the aggregate decomposition are presented in Figure 6. Table 3 and Table 4 summarize the results for the standard measures of top-end (90–50 log wage differential) and low-end (50–10 log wage differential) wage inequality, as well as for the variance of log wages and the Gini coefficient. The covariates used in the RIF-regression models are those discussed above and listed in Table A1. A richer specification with additional interaction terms is used to estimate the logit models used compute the reweighting factor ω ^ C ( T i , X i ) .31
Figure 6a shows the overall change in (real log) wages at each percentile τ , Δ O q τ , and decomposes this overall change into a composition ( Δ X q τ ) and wage structure ( Δ S q τ ) effect computed using the reweighting procedure of Result 1. Consistent with the pattern first documented in Autor et al. (2006), the overall change is U-shaped as wage dispersion increases in the top-end of the distribution, but declines in the lower end.32 Most summary measures of inequality such as the 90–10 gap nonetheless increase over the 1988–1990 to 2014–2016 period as wage gains in the top-end of the distribution exceed those at the low-end. In other words, although the curve for overall wage changes is U-shaped, its slope is positive, on average, suggesting that inequality generally goes up. This overall increase shows up as positive total changes in the 90–10 gap, the variance of log wages, and the Gini, reported in Table 3 and Table 4. In all cases, the aggregate decomposition of these overall measures attributes most (from 55% to 66%) of the changes to composition effects.
Figure 6a also shows that, consistent with Lemieux (2006b), composition effects have contributed to a substantial increase in inequality. In fact, once composition effects are accounted for, the remaining wage structure effects (estimated using reweighting) follow a “purer” U-shape than overall changes in wages. The wage declines are now right in the middle of the distribution (20th to 80th percentile), while wage gains at the top and low end are more similar. By the same token, however, composition effects cannot account at all for the U-shaped nature of wage changes.
Figure 7 moves to the next step of the decomposition using linear RIF-regressions to attribute the contribution of each set of covariates to the composition effect.33 Figure 8, which we discuss below, does the same for the wage structure effect. Figure 6b summarizes the total of the composition and wage structure effects by the sets of factors of interest. The combination of composition and wage structure effects shows the strong monotonic effect of education on wage changes, the mild U-shaped effect of union and occupations, and the offsetting hump-shaped effect of industries.
Figure 7a compares the overall composition effect obtained by reweighting and displayed in Figure 6a, Δ ^ X q τ , to the composition effect explained using the RIF-regressions, ( X ¯ 0 C X ¯ 0 ) γ ^ 0 q τ . The difference between the two curves is the specification (approximation) error R q τ . The error term is relatively small and does not exhibit much of a systematic pattern. This means that the RIF-regression model does relatively well at tracking down the composition effect estimated consistently using the reweighting procedure; however, as we discuss below, in some cases, the specification error is significantly different from zero.
Figure 7b then divides the composition effect (explained by the RIF-regressions) into the contribution of five main sets of factors. To simplify the discussion, we focus on the impact of each factor on overall wage inequality summarized by the 90–10 log wage differential in comparison to the 50–10 and 90–50 log wage differentials that capture what happened in the lower and upper parts of the distribution, respectively. The decomposition of the log wage differentials, the log variance, and the Gini are reported in Table 3 and Table 4. Table 3 presents the simple OB type decomposition computed from RIF-regressions of the five inequality measures, without reweighting. Table 4 applies the complete two-step procedure described above.
As discussed in Section 4.3, we compute the RIF of the difference between two (log) quantiles q 1 and q 2 , where q 2 > q 1 , as RIF ( y i ; q 2 q 1 ) = RIF ( y i ; q 2 ) RIF ( y i ; q 2 ) , and use these differences as dependent variables in the regressions. For the variance of log wages and the Gini, the RIF are as described above. Using the estimation results from these sets of regressions, we compute the components of the simple OB-type decomposition for the changes over time, ν ^ 1 ν ^ 0 = Δ ^ O B ν , from 1988–1990 ( T = 0 ) to 2014–2016 ( T = 1 ) as:
Δ ^ O B ν = X ¯ 1 X ¯ 0 γ ^ 0 ν Δ ^ X , O B ν + X ¯ 1 γ ^ 1 ν γ ^ 0 ν . Δ ^ S , O B ν
These results are displayed in Table 3 by groups of variables.34 In Table 4, we present the results of the decomposition that also applies the reweighting procedure
Δ ^ O ν = ( X ¯ 0 C X ¯ 0 ) · γ ^ 0 ν Δ ^ X , p ν + X ¯ 0 C · ( γ ^ C ν γ ^ 0 ν ) Δ ^ X , e ν + X ¯ 1 · γ ^ 1 ν γ ^ C ν Δ ^ S , p ν + ( X ¯ 1 X ¯ 0 C ) · γ ^ C ν . Δ ^ S , e ν
The four terms in this decomposition are easily obtained by running two OB decompositions using RIF regressions. First, we perform an OB decomposition using the T = 0 sample and the counterfactual sample ( T = 0 sample reweighted to be as in T = 1 ) to get the pure composition effect, Δ ^ X , p ν , using T = 0 as reference wage structure. The total unexplained effect in this decomposition corresponds to the specification error, Δ ^ X , e ν , and allows one to assess the importance of departures from the linearity assumption. Second, we perform the decomposition using the T = 1 sample and the counterfactual sample, using the counterfactual wage structure as reference, and obtain the pure wage structure effect, Δ ^ S , p ν , in the “unexplained" part of the decomposition. The total explained effect in this decomposition, Δ ^ S , e ν , corresponds to the reweighting error which should go to zero in large samples. It provides an easy way of assessing the quality of the reweighting.35
Consistent with Figure 7a, specification errors reported in Table 4 are generally small. As discussed in Section 3, the specification error reflects departures from non-linearity of the RIF-regressions and the fact that, except for the mean, the RIF depends on the distribution of Y (and X through its effect on Y). In Table 4, we formally test whether the specification error is significantly different from zero. The results are mixed. The specification error is not significantly different from zero for the 90–10 and the 50–10 gaps, but is statistically significant for the 90–50 gap, the variance, and the Gini. The specification error is nonetheless small relative to the overall changes in the distributional statistics, which indicates that RIF-regressions provide highly accurate estimates of the overall composition and wage structure effects in the empirical example being studied here. However, as we discuss below, although the specification error is small, using the two-step decomposition instead of a standard OB decomposition matters much more when looking at the contribution of individual covariates to the wage structure effect.
In both Table 3 and Table 4, the composition effects linked to factors other than unions go the “wrong way” in the sense that they account for rising inequality at the bottom end while inequality is rising at the top end, a point noted earlier by Autor et al. (2005). This applies in particular to education and occupations effects that are larger for the 50–10 than for the 90–50, while the effects of industry and other factors (race, marital status, and experience) on the 50–10 and 90–50 are similar. In contrast, composition effects linked to unions (the impact of de-unionization) reduce inequality at the low end (effect of −0.019 on the 50–10) but increases inequality at the top end (effect of 0.035 on the 90–50). Note that, just as in an OB decomposition, these effects on the 50–10 and the 90–50 gap can be obtained directly by multiplying the 9.5 percent decline in the unionization rate (Table A1) by the relevant union effects in 1988–1990 shown in Table 1. The effect of de-unionization accounts for about 25 percent of the total change in the 50–10 gap, which is remarkably similar to the relative contribution of de-unionization to the growth in inequality in the 1980s (see Freeman 1993; Card 1992; and DiNardo et al. 1996).
Figure 8a divides the wage structure effect, Δ ^ S q τ , into the part explained by the RIF-regression models, k = 2 M ( γ ^ 1 , k ν γ ^ C , k ν ) X ¯ 1 , and the residual change γ ^ 1 , 1 ν γ ^ C , 1 ν (the change in for the base group captured by the intercepts). The contribution of each set of factors is then shown in Figure 8b. As in the case of the composition effects, it is easier to discuss the results by focusing on the 90–50 and 50–10 gaps shown in Table 3 and Table 4.
Here, we note that the contribution of different covariates to the wage structure effect are quite different in Table 3 and Table 4. This indicates that the OB decomposition of Table 3 is inaccurate because of differences between the estimated RIF-regression coefficients γ ^ C ν and γ ^ 0 ν . As discussed in Section 3, the difference between γ ^ 1 ν and γ ^ C ν used to compute wage structure effects in Table 4 solely reflects changes in the wage structure. By contrast, the difference between γ ^ 1 ν and γ ^ 0 ν used in Table 3 is likely contaminated by changes in the distribution of X that are being adjusted for (by reweighting) when estimating γ ^ C ν . The difference is particularly striking in the case of education. As expected, Table 4 shows that wages structure effects linked to education play an important role in the growth of the 90–50 gap. By contrast, the effect is small and insignificant when using a conventional OB decomposition in Table 3. The case of education, a central variable in most studies on the sources of growing inequality, dramatically illustrates the importance of using the two-step decomposition with reweighting proposed in this paper.
The wage structure results of Table 4 first show that covariates overexplain −0.127 (sum of the five effects) of the −0.105 change (decline) in the 50–10 gap, the constant capturing the difference. Covariates do a less impressive job explaining changes in the 90–50 gap explaining only 0.068 (half) of the 0.136 change. Occupations are the set of the covariates that best capture the changes in the wage structure. They account for −0.075 of the −0.105 decline (73%) in the 50–10 gap and 0.088 of the 0.135 increase (68%) in the 90–50 gap. These results justify the increased attention given in the literature to the role of occupational tasks (Firpo et al. 2011; Fortin and Lemieux 2016). Changes in the returns to education continue to play an important role at the top of distribution accounting for 0.045 of the 0.135 increase (33%) in the 90–50. This supports Lemieux (2006a)’s conjecture that increases in the return to post-secondary education contribute to the convexification of the wage distribution.
Finally, the total effect of each covariate (wage structure plus composition effect) is reported in Figure 6b and the bottom panel of Table 4. Unions and occupations are the two factors that best account for the differential changes at the bottom and top of the distribution, capturing both a negative effect on the 50–10 and a positive effect on the 90–50. The total effect of the two factors on the 50–10 gap corresponds to −0.078 out of −0.105 (74%) of the change, while they account for 0.139 out of 0.136 change in the 90–50 (102%). This goes a substantial way towards explaining the polarization of the labor market.

6. Conclusions

We provide a detailed exposition of a two-stage method to decompose changes in the distribution of wages (or other outcome variables). In Stage 1, distributional changes are divided into a wage structure effect and a composition effect using a reweighting method. In Stage 2, these two components are further divided into the contribution of each individual covariate using the recentered influence function regression technique introduced by FFL. This two-stage procedure generalizes the popular OB decomposition method by extending the decomposition to any distributional measure (besides the mean), and allowing for a more flexible wage setting model. Other procedures (Machado and Mata 2005; Melly 2005; Rothe 2012; CFM) have been suggested for performing part of this decomposition for distributional parameters besides the means. One important advantage of our procedure is that it is easy to use in practice, as it simply involves estimating a logit model (first stage) and running least-square regressions (second stage). Another more distinctive advantage is that it can be used to divide the contribution of each covariate to the composition effect, something that most existing methods cannot do.
We illustrate the workings of our method by looking at changes in male wage inequality in the United States between 1988 and 2016. This is an interesting case to study as the wage distribution changed very differently at different points of the distribution, a phenomenon that cannot be captured by summary measures of inequality such as the variance of log wages. Our method is particularly well suited for looking in detail at the source of wage changes at each percentile of the wage distribution. Our findings indicate that unions, occupations, and education are the most important factors accounting for the observed changes in the wage distribution over this period.

Author Contributions

All authors contributed equally to the paper.

Funding

Fortin and Lemieux thank the Social Sciences and Humanities Research Council of Canada (grant# for financial support. Firpo thanks CNPq-Brazil for financial support.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Tables

Table A1. Sample Means.
Table A1. Sample Means.
Years:1988/902014/16Difference
Log wages2.8602.9010.041
Std of log wages0.5790.6220.043
Union covered0.2230.127−0.095
Non-white0.1340.1860.052
Non-Married0.3880.4570.068
Age36.20439.8823.677
Education
Primary0.0590.034−0.025
Some HS0.1180.054−0.064
High School0.3810.307−0.074
Some College0.2020.2750.072
College0.1390.2180.078
Post-grad0.1010.1130.012
Occupations
Upper Management0.0820.080−0.002
Lower Management0.0400.0680.028
Engineers & Computer Occ.0.0610.0810.019
Other Scientists0.0140.010−0.004
Social Support Occ.0.0520.0610.009
Lawyers & Doctors0.0100.0150.005
Health Treatment Occ.0.0100.0190.009
Clerical Occ.0.0660.0680.002
Sales Occ.0.0860.085−0.001
Insur. & Real Estate Sales0.0070.006−0.001
Financial Sales0.0030.002−0.001
Service Occ.0.1070.1490.042
Primary Occ.0.0260.011−0.015
Construction & Repair Occ.0.1640.155−0.009
Production Occ.0.1410.086−0.055
Transportation Occ.0.0860.060−0.026
Truckers0.0450.041−0.004
Industries
Agriculture, Mining0.0330.026−0.007
Construction0.0970.1010.005
Hi-Tech Manufac0.1020.066−0.037
Low-Tech Manufac0.1370.087−0.050
Wholesale Trade0.0510.033−0.018
Retail Trade0.1050.1130.008
Transportation & Utilities0.0860.079−0.008
Information except Hi-Tech0.0180.012−0.006
Financial Activities0.0470.0580.011
Hi-Tech Services0.0350.0640.029
Business Services0.0510.0650.014
Education & Health Services0.0970.1130.016
Personal Services0.0810.1270.046
Public Admin0.0580.054−0.005
Public Sector0.1490.126−0.024
Note: Computed using sample weights. All differences over time are statistically significant at the p = 0.001 level.
Table A2. Occupation and Industry Definitions.
Table A2. Occupation and Industry Definitions.
Code Sources:2010 Census SOC1980 SOC
Occupations
Upper Management10–200, 4301–13, 19
Lower Management200–95014–18, 20–37, 473–476
Engineers & Computer Occ.1000–156043–68, 213–218, 229
Other Scientists1600–196069–83, 166–173, 223–225, 235
Social Support Occ.2000–2060, 2140–2960113–165, 174–177, 183–199, 228, 234
Lawyers & Doctors2100–2110, 3010, 306084–85, 178–179
Health Treatment Occ.3000, 3030–3050, 3110–354086–106, 203–208
Clerical Occ.5000–5940303–389
Sales Occ.4700–4800, 4830–4900, 4930–4965243–252, 256–285
Insur. & Real Estate Sales4810,4920253–254
Financial Sales4820255
Service Occ.3600–4650430–470
Primary Occ.6000–6130477–499
Construction & Repair Occ.6200–7620503–617, 863–869
Production Occ.7700–8960633–799, 873, 233
Transportation Occ.9000–9120, 9140–9750803, 808–859, 876–889, 226–227
Truck Drivers9130804–806
Industries
Agriculture, Mining170–49010–50
Construction77060
Hi-Tech Manufac2170–2390, 3180, 3360–3690, 3960180–192, 210–212, 310, 321–322, 340–372
Low-Tech Manufac1070–2090, 2470–3170, 3190–3290, 3770–3890, 3970–3990100–162, 200–201,220–301, 311–320, 331–332, 380–392
Wholesale Trade4070–4590500–571
Retail Trade4670–5790580–640, 642–691
Transportation & Utilities570–690, 6070–6390400–432, 460–472
Information except Hi-Tech6470–6480, 6570–6670, 6770–6780171–172, 852
Financial Activities6870–7190700–712
Hi-Tech Services6490, 6675–6695, 7290–7460440–442, 732–740, 882
Business Services7270–7280, 7470–7790721–731, 741–791, 890, 892
Education & Health Services7860–8470812–851, 860–872, 891
Personal Services8560–9290641, 750–802, 880–881
Public Admin9370–9590900–932

Appendix B. Supplemental Material

Appendix B.1. Details of Weighting Functions Estimation

Appendix B.1.1. Estimating the Weights

We are interested in estimating weights ω that are generally functions of the distribution of ( T , X ). The three weighting functions under consideration are ω 1 ( T ) , ω 0 ( T ) , and ω C ( T , X ) . The first two weights are trivially estimated as:
ω ^ 1 ( T ) = T p ^ and ω ^ 0 ( T ) = 1 T 1 p ^
where p ^ = N 1 i = 1 N T i .
The weighting function ω C ( T , X ) can be estimated as
ω ^ C ( T , X ) = 1 T p ^ · p ^ X 1 p ^ X ,
where p ^ · is an estimator of the true probability of being in Group 1 given X. We describe in detail below the two approaches that we consider, a parametric one and a non-parametric one. In addition, to have weights summing up to one, we use the following normalization procedures:
ω ^ 1 ( T i ) = ω ^ 1 ( T i ) j = 1 N ω ^ 1 ( T j ) = T i N · p ^ , ω ^ 0 ( T i ) = ω ^ 0 ( T i ) j = 1 N ω ^ 0 ( T j ) = 1 T i N · 1 p ^ , ω ^ C ( T i , X i ) = ω ^ C ( T i ) j = 1 N ω ^ C ( T j ) = 1 T i · p ^ X i 1 p ^ X i j = 1 N 1 T j · p ^ X j 1 p ^ X j .

Appendix B.1.2. Estimating the Distributional Statistics

We are interested in the estimation and inference of ν 1 , ν 0 , and ν C . It can be shown that, under certain regularity conditions, estimators of these objects will be distributed asymptotically normal. We now show how to estimate those quantities, and derive their asymptotic distributions below.
The estimation follows a plug-in approach. Replacing the CDF by the empirical distribution function yields the estimators of interest:
ν ^ t = ν F ^ t , t = 0 , 1 ; ν ^ C = ν F ^ C
where
F ^ t y = i = 1 N ω ^ t ( T i ) · 1 I { Y i y } , t = 0 , 1
F ^ C y = i = 1 N ω ^ C ( T i , X i ) · 1 I { Y i y } .
Note that, in practice, it is not usually necessary to compute these empirical distribution functions to get estimates of a distributional statistic, ν ^ . Standard software programs such as Stata can be used to compute distributional statistics directly from the observations on Y using the appropriate weighting factor.
The estimated distributional statistics can then be used to estimate the wage structure and composition effects as Δ ^ S ν = ν ^ 1 ν ^ C and Δ ^ X ν = ν ^ C ν ^ 0 .

Appendix B.1.3. Parametric Propensity Score Estimation

Suppose that p X is correctly specified up to a finite vector of parameters δ 0 . That is, p X = p X ; δ 0 or more formally:
Assumption A1.
(Parametric p-score) Pr T = 1 | X = x = p x ; δ 0 ; where p · ; δ 0 : X [ 0 , 1 ] is a known function up to δ 0 R d , d < + .
Estimation of δ 0 follows by maximum likelihood:
δ ^ M L E = arg max δ i = 1 N T i · log p X i ; δ + 1 T i · log 1 p X i ; δ
Define the derivative of p X ; δ with respect to δ as p · X ; δ = p X ; δ / δ . The score function s T , X ; δ is:
s T , X ; δ = p · X ; δ · T p X ; δ p X ; δ · 1 p X ; δ
Using a normalization argument, we suppress the entry for δ whenever a function of it is evaluated at the true δ . Therefore,
s T , X ; δ 0 = s T , X = p · X · T p X p X · 1 p X
and finally
ω ^ C ( T , X ) = 1 T p ^ · p X ; δ ^ M L E 1 p X ; δ ^ M L E
In particular, in this paper, we assume that the p x ; δ 0 can be modeled as a logit, that is,
p x ; δ 0 = L ( x δ 0 )
where L : R R , L ( z ) = ( 1 + e x p ( z ) ) 1 .

Appendix B.1.4. Nonparametric Propensity Score Estimation

Suppose that p X is completely unknown to the researcher. In that case, following Hirano et al. (2003), we approximate the log odds ratio by a polynomial series. In practice, this is done by finding a vector π ^ that is the solution of the following problem:
π ^ = arg max π i = 1 N T i · log L H J X i π + 1 T i · log 1 L H J X i π
where H J ( x ) = [ H J , j ( x ) ] ( j = 1 , , J ) , a vector of length J of polynomial functions of x X satisfying the following properties: (i) H J : X R J ; and ( i i ) H J , 1 ( x ) = 1 . More details on this estimation procedure can be found at Hirano et al. (2003) or in Firpo (2007). The non-parametric feature of this estimation procedure comes from the fact that such approximation is refined as the sample size increases, that is, J will be a function of the sample size N , J = J ( N ) + as N + .
In this approach, p ( X ) is estimated by p ^ ( X ) = L ( H J ( X ) π ^ ) , thus:
ω ^ C ( T , X ) = 1 T p ^ · L ( H J ( X ) π ^ ) 1 L ( H J ( X ) π ^ )

Appendix B.2. Asymptotic Distribution

We first show that the plug-in estimators ν ^ are asymptotically normal and compute their asymptotic variances. We then do the same for the density estimators.

Appendix B.2.1. The Asymptotic Distribution of Plug-In Estimators

We start by assuming that the estimators ν ^ are asymptotically linear in the following sense:
Assumption A2 (Asymptotic Linearity).
ν ^ t and ν ^ C are asymptotically linear, that is,
ν F ^ t ν F t = i = 1 N ω ^ t T i , X i · IF ( Y i ; F t , ν ) + o p ( 1 / N ) ν F ^ C ν F C = i = 1 N ω ^ C T i , X i · IF ( Y i ; F C , ν ) + o p ( 1 / N )
Assumption A2 establishes that the estimators are either exactly linear, as those that are based on sample moments, or they can be linearized and the remainder term will approach zero as the sample size increases.
An additional technical assumption is that the influence function are square integrable and its conditional expectation given X is differentiable. To simplify notation, let us write IF ( Y t ; ν , F ) = ψ t ν ( Y ) .
Assumption A3.
Influence Function For all weighting functions ω considered,
(i) E ψ t ν Y ; F t 2 < , E ψ C ν Y ; F C 2 < and
(ii) E ψ t ν Y ; F t | X = x E ψ C ν Y ; F C | X = x and are continuously differentiable for all x in X .
Under ignorability, both types of estimators (parametric and non-parametric first step) for ν ^ 1 , ν ^ 0 , and ν ^ C proposed before will remain asymptotically linear. The theorem below considers both the parametric and non-parametric cases.
Theorem A1.
Asymptotic Normality of the ν ^ Estimators :
Under Assumptions 1, 2, A2 and A3:
(i-ii) N · ν ^ t ν t = 1 N i = 1 N ω t ( T i ) · ψ ν Y i ; F t + o p ( 1 ) D N 0 , V t , t = 0 , 1
(iii) (a) if in addition, Assumption A1 holds, then:
N · ν ^ C ν C = 1 N i = 1 N ω C ( T i , X i ) · ψ ν Y i ; F C + ω 1 ( T i ) ω C ( T i , X i ) · p · X i p X i · E s T , X · s T , X 1 · E p · X 1 p X · E ψ C ν Y ; F C | X , T = 0 + o p ( 1 ) D N 0 , V C , P
(iii) (b) otherwise, if in addition we assume [non-parametric], then:
N · ν ^ C ν C = 1 N i = 1 N ω C ( T i , X i ) · ψ ν Y i ; F C + ω 1 ( T i ) ω C ( T i , X i ) · E ψ C ν Y ; F C | X i , T = 0 + o p ( 1 ) D N 0 , V C , N P
where
V t = E ω t ( T ) · ψ t ν Y ; F t 2 , t = 0 , 1
V C , P = E [ ( ω C ( T , X ) · ψ ν Y ; F C + ω 1 ( T ) ω C ( T , X ) · p · X p X · E s T , X · s T , X 1 · E p · X 1 p X · E ψ C ν Y ; F C | X , T = 0 ) 2 ]
V C , N P = E [ ( ω C ( T , X ) · ψ ν Y , X ; F C + ω 1 ( T ) ω C ( T , X ) · E ψ C ν Y , X ; F C | X , T = 0 ) 2 ]

Appendix B.3. Proofs

Proof of Result 1.
A proof can be found in Firpo and Pinto (2016). □
Proof of Result 2.
Part (i) is straightforward and follows from identification of the functionals ν 1 , ν 0 and ν C , a direct consequence of identification of F 1 , F 0 and F C . Part ( i i ) follows from the fact that
F 1 y = E E 1 I { g 1 X , ε y } | T = 1 , X = E [ E 1 I { g 0 X , ε y } | T = 1 , X + E 1 I { g 1 X , ε y } 1 I { g 0 X , ε y } | T = 1 , X ] = F C y + F 1 0 y
where
F 1 0 y = E E 1 I { g 1 X , ε y } 1 I { g 0 X , ε y } | T = 1 , X
thus, if g 1 · , · = g 0 · , · , then for all y, F 1 0 y = 0 and
ν 1 = ν F 1 = ν F C + F 1 0 = ν F C = ν C .
Part ( i i i ) follows from a similar argument:
F 0 y = Pr Y 0 y | T = 0 , X = x · d F X | T x | 0 · d x = Pr Y 0 y | T = 0 , X = x · d F X | T x | 1 · d x + Pr Y 0 y | T = 0 , X = x · d F X | T x | 0 d F X | T x | 1 · d x = F C y + F Δ y
where
F Δ y = Pr Y 0 y | T = 0 , X = x · d F X | T x | 0 F X | T x | 1 · d x
thus if F X | T · | 1 = F X | T · | 0 , then for all x, F X | T x | 1 F X | T x | 0 = 0 and therefore, for all y, F Δ y = 0 and
ν 0 = ν F 0 = ν F C + F Δ = ν F C = ν C .
 □
Proof of Theorem A1.
A proof of parts (i), ( i i ) and ( i i i ) (b) can be found in Firpo and Pinto (2016). A proof of part ( i i i ) (a) can be found in Chen et al. (2008). □

References

  1. Acemoglu, Daron, and David H. Autor. 2011. Skills, Tasks, and Technologies: Implications for Employment and Earnings. In Handbook of Labor Economics. Edited by Orley Ashenfelter and David Card. Amsterdam: North-Holland, vol. IV.B, pp. 1043–172. [Google Scholar]
  2. Alvaredo, Facundo, Anthony B. Atkinson, Thomas Piketty, and Emmanuel Saez. 2013. The Top 1 Percent in International and Historical Perspective. Journal of Economic Perspectives 27: 3–20. [Google Scholar] [CrossRef]
  3. Autor, David H., and David Dorn. 2013. The Growth of Low-Skill Service Jobs and the Polarization of the US Labor Market. American Economic Review 103: 1553–97. [Google Scholar] [CrossRef] [Green Version]
  4. Autor, David H., David Dorn, Gordon H. Hanson, and Jae Song. 2014. Trade Adjustment: Worker-level Evidence. Quarterly Journal of Economics 129: 1799–860. [Google Scholar] [CrossRef]
  5. Autor, David H., David Dorn, and Gordon H. Hanson. 2015. Untangling Trade and Technology: Evidence from Local Labour Markets. Economic Journal 125: 621–46. [Google Scholar] [CrossRef]
  6. Autor, David H., Lawrence F. Katz, and Melissa S. Kearney. 2005. Rising Wage Inequality: The Role of Composition and Prices. NBER Working paper No. 11628. Cambridge, MA, USA: National Bureau of Economic Research. [Google Scholar]
  7. Autor, David H., Lawrence F. Katz, and Melissa S. Kearney. 2006. The Polarization of the U.S. Labor Market. American Economic Review 96: 189–94. [Google Scholar] [CrossRef]
  8. Autor, David H., Frank Levy, and Richard J. Murnane. 2003. The Skill Content Of Recent Technological Change: An Empirical Exploration. Quarterly Journal of Economics 118: 1279–333. [Google Scholar] [CrossRef]
  9. Barsky, Robert, John Bound, Kerwin Kofi Charles, and Joseph P. Lupton. 2002. Accounting for the Black-White Wealth Gap: A Nonparametric Approach. Journal of the American Statistical Association 97: 663–73. [Google Scholar] [CrossRef]
  10. Bento, Antonio, Kenneth Gillingham, and Kevin Roth. 2017. The Effect of Fuel Economy Standards on Vehicle Weight Dispersion and Accident Fatalitiesc. NBER Working paper No. w23340. Cambridge, MA, USA: National Bureau of Economic Research. [Google Scholar]
  11. Blinder, Alan. 1973. Wage Discrimination: Reduced Form and Structural Estimates. Journal of Human Resources 8: 436–55. [Google Scholar] [CrossRef]
  12. Brochu, Pierre, David A. Green, Thomas Lemieux, and James Townsend. 2017. The Minimum Wage, Turnover, and the Shape Effects of Wage Distribution. In Mimeo. Vancouver: University of British Columbia. [Google Scholar]
  13. Card, David. 1992. The Effects of Unions on the Distribution of Wages: Redistribution or Relabelling? NBER Working paper No. 4195. Cambridge, MA, USA: National Bureau of Economic Research. [Google Scholar]
  14. Cattaneo, Matias D., Michael Jansson, and Xinwei Ma. 2017. Simple Local Polynomial Density Estimators. In Mimeo. Berkeley: UC Berkeley. [Google Scholar]
  15. Chamberlain, Gary. 1994. Quantile Regression Censoring and the Structure of Wages. In Advances in Econometrics. Edited by Christopher Sims. New York: Elsevier. [Google Scholar]
  16. Chernozhukov, Victor, Ivan Fernandez-Val, and Blaise Melly. 2013. Inference on Counterfactual Distributions. Econometrica 81: 2205–68. [Google Scholar]
  17. Chen, Xiaohong, Han Hong, and Alessandro Tarozzi. 2008. Semiparametric Efficiency in GMM Models with Auxiliary Data. The Annals of Statistics 36: 808–43. [Google Scholar] [CrossRef]
  18. Choe, Chung, and Philippe Van Kerm. 2014. Foreign Workers and the Wage Distribution: Where Do They Fit in? Technical Report 2014-02. Esch-sur-Alzette: Luxembourg Institute of Socio-Economic Research. [Google Scholar]
  19. Cowell, Frank, and Maria-Pia Victoria-Feser. 1996. Robustness Properties of Inequality Measures. Econometrica 64: 77–101. [Google Scholar] [CrossRef]
  20. DiNardo, John, Nicole M. Fortin, and Thomas Lemieux. 1996. Labor Market Institutions and the Distribution of Wages, 1973–1992: A Semiparametric Approach. Econometrica 64: 1001–44. [Google Scholar] [CrossRef]
  21. Eeckhout, Jan, Roberto Pinheiro, and Kurt Schmidheiny. 2014. Spatial sorting. Journal of Political Economy 122: 554–620. [Google Scholar] [CrossRef]
  22. Essama-Nssah, Boniface, and Peter J. Lambert. 2012. Influence functions for policy impact analysis. In Inequality, Mobility and Segregation: Essays in Honor of Jacques Silber. Edited by John A. Bishop and Rafael Salas. Cheltenham: Emerald Group Publishing Limited, chp. 6. pp. 135–59. [Google Scholar]
  23. Firpo, Sergio. 2007. Efficient Semiparametric Estimation of Quantile Treatment Effects. Econometrica 75: 259–76. [Google Scholar] [CrossRef]
  24. Firpo, Sergio, Nicole M. Fortin, and Thomas Lemieux. 2007. Decomposing Wage Distributions using Recentered Influence Functions Regressions. In Mimeo. Vancouver: University of British Columbia. [Google Scholar]
  25. Firpo, Sergio, Nicole M. Fortin, and Thomas Lemieux. 2009. Unconditional Quantile Regressions. Econometrica 77: 953–973. [Google Scholar]
  26. Firpo, Sergio, Nicole M. Fortin, and Thomas Lemieux. 2011. Occupational Tasks and Changes in the Wage Structure. In Mimeo. Vancouver: University of British Columbia. [Google Scholar]
  27. Firpo, Sergio, and Cristine Pinto. 2016. Identification and Estimation of Distributional Impacts of Interventions Using Changes in Inequality Measures. Journal of Applied Econometrics 31: 457–86. [Google Scholar] [CrossRef]
  28. Fortin, Nicole, Thomas Lemieux, and Sergio Firpo. 2011. Decomposition Methods in Economics. In Handbook of Labor Economics. Edited by Orley Ashenfelter and David Card. Amsterdam: North-Holland, vol. IV.A, pp. 1–102. [Google Scholar]
  29. Fortin, Nicole, and Thomas Lemieux. 2016. Inequality and Changes in Task Prices: Within and between Occupation Effects? In Income Inequality, Causes and Consequences (Research in Labor Economics, Vol. 43). Edited by Lorenzo Cappellari, Solomon W. Polachek and Konstantinos Tatsiramos. Cheltenham: Emerald Group Publishing Limited, pp. 195–226. [Google Scholar]
  30. Freeman, Richard B. 1980. Unionism and the Dispersion of Wages. Industrial and Labor Relations Review 34: 3–23. [Google Scholar] [CrossRef]
  31. Freeman, Richard B. 1993. How Much has Deunionization Contributed to the Rise of Male Earnings Inequality? In Uneven Tides: Rising Income Inequality in America. Edited by Sheldon Danziger and Peter Gottschalk. New York: Russell Sage Foundation, pp. 133–63. [Google Scholar]
  32. Gâteaux, René. 1913. Sur les fonctionnelles continues et les fonctionnelles analytiques. Comptes Rendus de l’Académie des Sciences-Series I—Mathematics 157: 325–27. [Google Scholar]
  33. Gardeazabal, Javier, and Arantza Ugidos. 2004. More on the Identification in Detailed Wage Decompositions. Review of Economics and Statistics 86: 1034–57. [Google Scholar] [CrossRef]
  34. Gradín, Carlos. 2016. Why Is Income inequality so High in Spain? In Income Inequality Around the World (Research in Labor Economics, Vol. 44). Edited by Lorenzo Cappellari, Solomon W. Polachek and Konstantinos Tatsiramos. Cheltenham: Emerald Group Publishing Limited, pp. 109–77. [Google Scholar]
  35. Hampel, Frank R. 1974. The Influence Curve and Its Role in Robust Estimation. Journal of the American Statistical Association 60: 383–93. [Google Scholar] [CrossRef]
  36. Heckman, James J. 1990. Varieties of Selection Bias. American Economic Review 80: 313–18. [Google Scholar]
  37. Heckman, James J., Hidehiko Ichimura, and Petra Todd. 1997. Matching as an Econometric Evaluation Estimator. Review of Economic Studies 65: 261–94. [Google Scholar] [CrossRef]
  38. Heckman, James J., Hidehiko Ichimura, Jeffrey A. Smith, and Petra Todd. 1998. Characterizing Selection Bias Using Experimental Data. Econometrica 66: 1017–98. [Google Scholar] [CrossRef]
  39. Heckman, James J., and Richard Robb. 1985. Alternative Methods for Evaluating the Impact of Interventions: An Overview. Journal of Econometrics 30: 239–67. [Google Scholar] [CrossRef]
  40. Heckman, James J., and Richard Robb. 1986. Alternative Methods for Solving the Problem of Selection Bias in Evaluating the Impact of Treatments on Outcomes. In Drawing Inference from Self-Selected Samples. Edited by Howard Wainer. New York: Springer, pp. 63–107. [Google Scholar]
  41. Hirano, Keisuke, Guido W. Imbens, and Geert Ridder. 2003. Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score. Econometrica 71: 1161–89. [Google Scholar] [CrossRef]
  42. Jann, Ben. 2008. The Oaxaca-Blinder Decomposition for Linear Regression Models. Stata Journal 8: 435–79. [Google Scholar]
  43. Juhn, Chinhui, Kevin Murphy, and Brooks Pierce. 1993. Wage Inequality and the Rise in Returns to Skill. Journal of Political Economy 101: 410–42. [Google Scholar] [CrossRef]
  44. Kline, Patrick. 2011. Oaxaca-Blinder as a Reweighting Estimator. American Economic Review 101: 532–37. [Google Scholar] [CrossRef]
  45. Koenker, Roger. 2005. Quantile Regression. Cambridge: Cambridge University Press. [Google Scholar]
  46. Koenker, Roger, and Gilbert Bassett Jr. 1978. Regression Quantiles. Econometrica 46: 33–50. [Google Scholar] [CrossRef]
  47. Lemieux, Thomas. 2002. Decomposing Changes in Wage Distributions: A Unified Approach. Canadian Journal of Economics 35: 646–88. [Google Scholar] [CrossRef]
  48. Lemieux, Thomas. 2006a. Post-secondary Education and Increasing Wage Inequality. American Economic Review 96: 195–99. [Google Scholar] [CrossRef]
  49. Lemieux, Thomas. 2006b. Increasing Residual Wage Inequality: Composition Effects, Noisy Data, or Rising Demand for Skill? American Economic Review 96: 461–98. [Google Scholar] [CrossRef]
  50. Lemieux, Thomas. 2008. The Changing Nature of Wage Inequality. Journal of Population Economics 21: 21–48. [Google Scholar] [CrossRef]
  51. Machado, José A. F., and José Mata. 2005. Counterfactual Decomposition of Changes in Wage Distributions Using Quantile Regression. Journal of Applied Econometrics 20: 445–65. [Google Scholar] [CrossRef]
  52. Melly, Blaise. 2005. Decomposition of Differences in Distribution Using Quantile Regression. Labour Economics 12: 577–1990. [Google Scholar] [CrossRef]
  53. Monti, Anna Clara. 1991. The Study of the Gini Concentration Ratio by Means of the Influence Function. Statistica 51: 561–77. [Google Scholar]
  54. Oaxaca, Ronald. 1973. Male-Female Wage Differentials in Urban Labor Markets. International Economic Review 14: 693–709. [Google Scholar] [CrossRef]
  55. Oaxaca, Ronald, and Michael R. Ransom. 1999. Identification in Detailed Wage Decompositions. Review of Economics and Statistics 81: 154–57. [Google Scholar] [CrossRef]
  56. Rosenbaum, Paul R., and Donald B. Rubin. 1983. The Central Role of the Propensity Score in Observational Studies for Causal Effects. Biometrika 70: 41–55. [Google Scholar] [CrossRef]
  57. Rosenbaum, Paul R., and Donald B. Rubin. 1984. Reducing Bias in Observational Studies Using Subclassification on the Propensity Score. Journal of the American Statistical Association 79: 516–24. [Google Scholar] [CrossRef]
  58. Rothe, Christoph. 2010. Nonparametric Estimation of Distributional Policy Effects. Journal of Econometrics 155: 56–70. [Google Scholar] [CrossRef]
  59. Rothe, Christoph. 2012. Partial Distributional Policy Effects. Econometrica 80: 2269–301. [Google Scholar]
  60. Rothe, Christoph. 2015. Decomposing the Composition Effect. Journal of Business Economics and Statistics 33: 323–37. [Google Scholar] [CrossRef]
  61. Von Mises, Richard. 1947. On the Asymptotic Distribution of Differentiable Statistical Functions. The Annals of Mathematical Statistics 18: 309–48. [Google Scholar] [CrossRef]
  62. White, Halbert. 1980. Using Least Squares to Approximate Unknown Regression Functions. International Economic Review 21: 149–70. [Google Scholar] [CrossRef]
  63. Yun, Myeong-Su. 2005. A simple Solution to the Identification Problem in Detailed Wage Decompositions. Economic Inquiry 43: 766–72. [Google Scholar] [CrossRef]
1
Recentered influence functions have since been derived for a host of inequality measures by Essama-Nssah and Lambert (2012).
2
Eeckhout et al. (2014) compare the CFM approach to the RIF-regressions approach to decompose the skill distributions across large and small cities in terms of education, occupations, and industries, focusing on the bottom and top decile. Bento et al. (2017) provide a useful comparison of local kernel regressions, conditional quantile regressions, and RIF regressions in the context of a Monte-Carlo simulation of the effect of fuel economy standards on the distribution of vehicle weight.
3
See also Rothe (2010).
4
The federal minimum wage has declined substantially (in real terms) over time and is now superseeded by higher state minimum wages in most states. As a result, the effect of state and federal minimum wages would need to be modeled over of a range of wages. This task is beyond the scope of the current paper.
5
Consider, for instance, the contribution of increasing returns to education to changes in mean wages over time in the case where workers are either high school graduates or college graduates. In the case where high school is the base group, X i , k is a dummy variable indicating that the worker is a college graduate, and β 0 , k and β 1 , k are the effect of college on wages in years t = 0 and 1. If returns to college increase over time ( β 1 , k β 0 , k > 0 ), then the contribution of education to the wage structure effect, X ¯ 1 , k β 1 , k β 0 , k , is positive, where X ¯ 1 , k is the share of college graduates. If we use instead college as the base group, then X ¯ 1 , k β 1 , k β 0 , k is negative, where X ¯ 1 , k represents the share of high school ( X ¯ 1 , k = 1 X ¯ 1 , k ) and β t , k represents the effect of high school ( β t , k = β t , k ). Thus, whether changes in returns to schooling contribute positively or negatively to the change in mean wages critically depends on the choice of the base group.
6
As we show below, our goal is to estimate a counterfactual mean wage that would prevail if workers in Group 1 were paid under the wage structure of Group 0. Under the linearity assumption, this is equal to E X | T = 1 β 0 , a term that appears in both the wage structure and composition effect. The problem is that, when linearity does not hold, the counterfactual mean wage is not be equal to E X | T = 1 β 0 .
7
Kline (2011) notes that, if the reweighting factor is linear in the covariates, the OB decomposition will yield a valid estimate of the counterfactual mean even if the conditional expectation is not linear in the covariates.
8
We discuss the case of reweighting in more detail below. In the case where the conditional expectation E ( Y i | X i , T = t ) is estimated non-parametrically, a whole different procedure would have to be used to separate the wage structure into the contribution of each covariate. For instance, average derivative methods could be used to estimate an effect akin to the β coefficients used in standard decompositions. Unfortunately, these methods are difficult to use in practice, and would not be helpful in dividing up the composition effect into the contribution of each individual covariate.
9
We sometimes refer to the functional ν ( F Z ) simply as ν Z . In the Oaxaca–Blinder decomposition discussed earlier, the parameter ν equals the mean ( ν = μ ) and Δ O ν is the total difference in mean wages.
10
11
This rules out selection into Group 1 or 0 based on unobservables.
12
This is not a restrictive assumption when looking at changes in the wage distribution over time. Problems could arise, however, in gender wage gap decompositions where some of the detailed occupations are only held by men or by women.
13
See also Firpo and Pinto (2016).
14
Note that, even if g 1 ( · , ε ) = h 1 ( ε ) and g 0 ( · , ε ) = h 0 ( ε ) , the result from Result 2 is unaffected. The intuition is that, since ( X , ε ) have a joint distribution, we can use the available information on that distribution to reweight the effect of the ε ’s on Y.
15
This finding is closely linked to the well-known fact that estimates of marginal effects estimated using a linear probability model tend to be very similar, in practice, to those obtained using a probit, logit, or another flexible non-linear discrete response model.
16
In the case of the mean, another rationale for using a linear model comes from Kline (2011), who notes that the OB decomposition remains valid even when the regression function is non-linear as long as the reweighting factor ω C is well approximated by a linear odds ratio model. Unfortunately, this property does not hold for distributional statistics besides the mean.
17
In the case of the mean, several procedures have been suggested as potential solutions to the base group problem. They typically involve creating an artificial base group with the average observed characteristics in the population (see, e.g., Yun 2005). As this choice is as arbitrary as other choices of base group, and arguably harder to interpret, especially across studies, it does not really solve the base group problem. See Fortin et al. (2011) for a more complete discussion. In Footnote 29, we also discuss some issues with previous attempts (Firpo et al. 2007) using a normalization approach to the base group.
18
In practice, we simply use the Stata integ command.
19
This technological change explanation was first suggested by Autor et al. (2003). It also implies that the wages of both skilled (e.g., doctors) and unskilled (e.g., truck drivers) non-routine jobs, at the top and low end of the wage distribution, increased relative to those of “routine” workers in the middle of the wage distribution.
20
Autor et al. (2005) used the Machado and Mata (2005) method to decompose changes at each quantile into a “price” (wage structure) and “quantity” (composition) effect. They did not further consider, however, the contribution of each individual covariate to the wage structure effect, except for separating the contribution of (all) covariates from the residual change in inequality. See also Lemieux (2002) for a similar decomposition based on a reweighting procedure.
21
Table A2 gives the details of the occupation and industry categories used.
22
Several cross-validation tools suggested tuning parameters in that range, but the graphs were indistinguishable. In addition to the reweighting factors discussed in Section 3 and Section 4, we also use CPS sample weights throughout the empirical analysis. In practice, this means that we multiply the relevant reweighting factor with CPS sample weight.
23
See Brochu et al. (2017) for a more precise modeling of the effect of minimum wages on the distribution of wages.
24
Weekly earnings are top-coded at $1923 in 1988–1990 and $2884 in 2014–2016. The latter is substantially lower in constant dollars. Furthermore, the top-code is even higher in relative terms because of the substantial growth in real wages at the top end of the distribution.
25
A large fraction of workers top-coded at $2884 a week work 40 h a week, which yields an hourly wage rate of $72.1. Applying the 1.4 adjustment factor increases the wage to $100.9, or about $92.5 in dollars of 2010. This precisely matches the spike in Figure 1 since log(92.5) = 4.53.
26
Deflating wages with monthly CPI while combining several years of data helps mitigate the issue of heaping.
27
There are only 5–6 women in this category, which highlights the need of using different base groups for men and women.
28
In nominal terms, the mode of the distributions is around $10.00/h in 1988–1990 and around $19.00/h in 2014–2016. In 1988–1990, there is a second local peak around $12.00/h, while, in 2014–2016, the second lower local peak is around $10.00/h.
29
In Firpo et al. (2007), we used a mixed approach for the base group normalizing the coefficients of the occupation and industry dummies. That approach, although superficially attractive, has the important disadvantage of limiting the explanatory power of the variables whose coefficients are constrained. As a result, in this earlier version of the paper, very little of the changes in inequality were attributable to occupations and industries.
30
As argued in FFL, the different relative strength of between and within effects at different quantiles explain the inverse U-shaped effect of unions. This is in sharp contrast with the effect of unions found estimated using conditional quantile regressions which captures only within-group effects and declines monotonically over the wage distribution (Chamberlain 1994).
31
The logit specification also includes a full set of interaction between experience and education, union status and education, union status and experience, between education and occupations, and experience and industries.
32
This stands in sharp contrast with the situation that prevailed in the 1980s when the corresponding curve was positively sloped as wage dispersion increased at all points of the distribution (Juhn et al. 1993).
33
The effect of each set of factors is obtained by summing up the contribution of the relevant covariates. For example, the effect for “education” is the sum of the effect of each of the five education categories shown in Table 1. Showing the effect of each individual dummy separately would be cumbersome and harder to interpret.
34
In practice, we use the popular Jann (2008) “oaxaca" Stata ado file and obtain bootstrapped standard errors over the entire procedure given the statistics and the RIF are estimated values. We opted for boostrapped instead of analytical standard errors by simplicity. Computation of analytical standard errors would involve estimation of different functionals, increasing the degree of complexity of the estimation step, whereas bootstrapped standard errors, although being potentially computationally more demanding are typically simpler to implement.
35
Adding more terms in the specification of the reweighting function helps reducing the reweighting error. This has to be balanced with issues of common support, as more terms may lead to more perfect predictions, an undesirable outcome. As we discuss below, the specification we use yields a very small reweighting error.
Figure 1. Density of Log Wages ($2010)—Men CPS. Note: The vertical lines show the minimum and maximum of state and federal minimum wages in each time period.
Figure 1. Density of Log Wages ($2010)—Men CPS. Note: The vertical lines show the minimum and maximum of state and federal minimum wages in each time period.
Econometrics 06 00028 g001
Figure 2. Density of Log Wages ($2010)—Base Group. Note: The vertical lines show the minimum and maximum of state and federal minimum wages in each time period.
Figure 2. Density of Log Wages ($2010)—Base Group. Note: The vertical lines show the minimum and maximum of state and federal minimum wages in each time period.
Econometrics 06 00028 g002
Figure 3. Unconditional Quantile Coefficients—Demographics and Human Capital.
Figure 3. Unconditional Quantile Coefficients—Demographics and Human Capital.
Econometrics 06 00028 g003
Figure 4. Unconditional Quantile Coefficients—Occupations.
Figure 4. Unconditional Quantile Coefficients—Occupations.
Econometrics 06 00028 g004
Figure 5. Unconditional Quantile Coefficients—Industries.
Figure 5. Unconditional Quantile Coefficients—Industries.
Econometrics 06 00028 g005
Figure 6. Decomposition of Total Change into Composition and Wage Structure Effects.
Figure 6. Decomposition of Total Change into Composition and Wage Structure Effects.
Econometrics 06 00028 g006
Figure 7. Decomposition of Composition Effects.
Figure 7. Decomposition of Composition Effects.
Econometrics 06 00028 g007
Figure 8. Decomposition of Wage Structure Effects.
Figure 8. Decomposition of Wage Structure Effects.
Econometrics 06 00028 g008
Table 1. Unconditional Quantile Regression Coefficients on Log Wages.
Table 1. Unconditional Quantile Regression Coefficients on Log Wages.
Years:1988/902014/16
Quantiles:105090105090
Explanatory Variables
Union covered0.146 0.343 −0.025 0.058 0.240 −0.008
(0.003)(0.005)(0.004)(0.003)(0.006)(0.007)
Non-white−0.063 −0.137 −0.072 −0.053 −0.106 −0.041
(0.006)(0.005)(0.005)(0.004)(0.004)(0.006)
Non-Married−0.111 −0.109 −0.031 −0.046 −0.107 −0.064
(0.004)(0.003)(0.004)(0.003)(0.004)(0.005)
Education (High School omitted)
Primary−0.301 −0.312 −0.109 −0.212 −0.415 −0.110
(0.011)(0.006)(0.005)(0.01)(0.009)(0.006)
Some HS−0.305 −0.112 0.005−0.275 −0.215 0.002
(0.007)(0.005)(0.003)(0.008)(0.007)(0.004)
Some College0.055 0.135 0.112 0.036 0.098 0.023
(0.005)(0.004)(0.005)(0.004)(0.005)(0.004)
College0.143 0.343 0.410 0.125 0.409 0.493
(0.005)(0.005)(0.008)(0.004)(0.006)(0.009)
Post-grad0.094 0.418 0.772 0.099 0.502 0.962
(0.006)(0.006)(0.013)(0.004)(0.008)(0.017)
Potential Experience (20 ≤ Experience < 25 omitted)
Experience < 5−0.486 −0.448 −0.312 −0.335 −0.425 −0.301
(0.009)(0.006)(0.008)(0.007)(0.007)(0.011)
5 ≤ Experience < 10−0.056 −0.270 −0.278 −0.067 −0.285 −0.306
(0.006)(0.006)(0.008)(0.005)(0.007)(0.011)
10 ≤ Experience < 15−0.005−0.122 −0.172 −0.022 −0.157 −0.182
(0.005)(0.006)(0.008)(0.004)(0.006)(0.011)
15 ≤ Experience < 200.002−0.051 −0.091 −0.009 −0.051 −0.034
(0.005)(0.005)(0.008)(0.004)(0.006)(0.012)
25 ≤ Experience < 300.0100.033 0.060 −0.0010.020 0.036
(0.006)(0.006)(0.01)(0.004)(0.006)(0.012)
30 ≤ Experience < 350.017 0.048 0.071 0.0080.037 0.042
(0.006)(0.006)(0.011)(0.004)(0.007)(0.012)
35 ≤ Experience < 400.022 0.028 0.061 0.013 0.054 0.062
(0.007)(0.008)(0.012)(0.004)(0.007)(0.013)
Experience ≥ 400.068 0.020 −0.0100.030 0.058 −0.013
(0.008)(0.008)(0.009)(0.005)(0.007)(0.012)
R−square0.2530.3590.2060.1820.3530.202
No. of observations 268,494 236,296
Note: Linear limited dependent variable model. Bootstrapped standard errors (500 repetitions) are in parentheses. Statistical signifiance levels: p ≤ 0.01, p ≤ 0.05, p ≤ 0.1. Also included in the regression are a public sector dummy, 16 occupation dummies, and 14 industry dummies. The base group is made up of individuals who are non-unionized (not covered), not in the public sector, white, married, have a high school degree, work as construction workers in the construction industry.
Table 2. RIF Regression of Inequality Measures.
Table 2. RIF Regression of Inequality Measures.
Years:1988/902014/161988/902014/16
Inequality MeasuresVariance of Log WagesGini
Estimated Values:0.3410.4180.3300.396
Explanatory Variables
Constant0.203 0.205 0.261 0.290
(0.004)(0.006)(0.002)(0.002)
Union covered−0.075 −0.040 −0.067 −0.039
(0.002)(0.004)(0.001)(0.001)
Non-white−0.0020.0050.006 0.005
(0.003)(0.004)(0.001)(0.001)
Non-Married0.039 0.0010.022 0.008
(0.002)(0.004)(0.001)(0.001)
Education (High School omitted)
Primary0.074 0.073 0.051 0.057
(0.004)(0.006)(0.002)(0.002)
Some HS0.104 0.129 0.048 0.063
(0.003)(0.005)(0.001)(0.001)
Some College0.028 −0.0010.006 −0.006
(0.003)(0.003)(0.002)(0.003)
College0.121 0.166 0.053 0.061
(0.005)(0.005)(0.002)(0.001)
Post-grad0.301 0.401 0.157 0.177
(0.007)(0.01)(0.003)(0.002)
Potential Experience (20 ≤ Experience < 25 omitted)
Experience < 50.047 0.027 0.031 0.021
(0.004)(0.007)(0.002)(0.002)
5 ≤ Experience < 10−0.098 −0.093 −0.036 −0.030
(0.005)(0.007)(0.002)(0.002)
10 ≤ Experience < 15−0.078 −0.070 −0.035 −0.028
(0.004)(0.007)(0.002)(0.002)
15 ≤ Experience < 20−0.050 −0.006−0.026 0.003
(0.005)(0.008)(0.002)(0.002)
25 ≤ Experience < 300.023 0.024 0.012 0.014
(0.006)(0.008)(0.002)(0.002)
30 ≤ Experience < 350.022 0.017 0.008 0.007
(0.006)(0.008)(0.002)(0.002)
35 ≤ Experience < 400.015 0.022 0.008 0.008
(0.007)(0.008)(0.003)(0.002)
Experience ≥ 40−0.031 −0.012−0.015 −0.005
(0.005)(0.008)(0.003)(0.002)
Occupations (Construction & Repair Occ. omitted)
Upper Management0.235 0.415 0.132 0.203
(0.007)(0.011)(0.003)(0.002)
Lower Management0.090 0.200 0.027 0.080
(0.008)(0.009)(0.003)(0.002)
Engineers & Computer Occ.0.107 0.202 0.013 0.054
(0.006)(0.009)(0.003)(0.002)
Other Scientists0.081 0.134 0.025 0.068
(0.011)(0.027)(0.005)(0.006)
Social Support Occ.−0.0010.065 −0.012 0.012
(0.007)(0.009)(0.003)(0.003)
Lawyers & Doctors0.524 0.637 0.337 0.363
(0.027)(0.032)(0.010)(0.008)
Health Treatment Occ.−0.0200.115 −0.035 0.011
(0.0101)(0.012)(0.005)(0.005)
Clerical Occ.0.013 0.069 0.017 0.044
(0.004)(0.005)(0.002)(0.002)
Explanatory Variables
Occupations (cnt.)
Sales Occ.0.088 0.177 0.043 0.084
(0.005)(0.008)(0.002)(0.002)
Insur. & Real Estate Sales0.208 0.197 0.152 0.105
(0.031)(0.038)(0.011)(0.010)
Financial Sales0.525 0.409 0.429 0.219
(0.06)(0.076)(0.018)(0.014)
Service Occ.0.188 0.208 0.101 0.107
(0.004)(0.005)(0.002)(0.002)
Primary Occ.0.226 0.222 0.114 0.127
(0.008)(0.015)(0.004)(0.004)
Production Occ.0.0040.020 0.011 0.028
(0.003)(0.005)(0.001)(0.002)
Transportation Occ.0.119 0.145 0.079 0.094
(0.004)(0.006)(0.002)(0.002)
Truckers0.015 0.042 0.030 0.040
(0.004)(0.006)(0.002)(0.002)
Industries (Construction omitted)
Agriculture, Mining0.079 0.0130.036 −0.001
(0.008)(0.012)(0.003)(0.003)
Hi-Tech Manufac0.018 0.014−0.0010.002
(0.005)(0.009)(0.002)(0.002)
Low-Tech Manufac−0.037 −0.053 −0.011 −0.019
(0.004)(0.007)(0.002)(0.002)
Wholesale Trade−0.012−0.027 0.001−0.006
(0.006)(0.012)(0.002)(0.003)
Retail Trade0.060 0.016*0.038 0.023
(0.005)(0.007)(0.002)(0.002)
Transportation & Utilities0.013 −0.029 −0.005 −0.019
(0.005)(0.007)(0.002)(0.002)
Information except Hi-Tech−0.0010.055 −0.010 0.041
(0.008)(0.019)(0.003)(0.005)
Financial Activities0.065 0.064 0.052 0.053
(0.009)(0.013)(0.004)(0.003)
Hi-Tech Services0.048 0.071 0.018 0.035
(0.008)(0.01)(0.004)(0.003)
Business Services0.018 −0.042 0.019 −0.014
(0.005)(0.008)(0.002)(0.002)
Education & Health Services−0.008−0.064 −0.001−0.018
(0.006)(0.008)(0.003)(0.002)
Personal Services0.136 0.054 0.051 0.023
(0.006)(0.006)(0.002)(0.002)
Public Admin−0.038 −0.071 −0.036 −0.029
(0.007)(0.011)(0.003)(0.003)
Public Sector−0.058 −0.055 −0.030 −0.048
(0.005)(0.007)(0.002)(0.002)
R-squared0.1150.0870.0480.025
No. of observations268,492236,287268,492236,287
Note: Bootstrapped standard errors (500 repetitions) are in parentheses. Statistical signifiance levels: *** p ≤ 0.01, ** p ≤ 0.05, * p ≤ 0.1. The base group is made up of individuals who are non-unionized (not covered), not public sector, white, married, have a high school degree, work as construction workers in the construction industry. Trimmed sample drops 15 observations with hourly wages > $1,636 ($2010).
Table 3. Decomposition Results without Reweighting.
Table 3. Decomposition Results without Reweighting.
Inequality Measures90–1050–1090–50Variance (× 100)Gini (× 100)
Total Change0.125 −0.075 0.201 7.775 6.599
Composition0.089 0.037 0.052 4.163 1.966
Wage Structure0.037 −0.112 0.149 3.612 4.633
Composition Effects:
Union0.016 −0.019 0.035 0.713 0.639
Other0.019 0.008 0.011 0.984 0.473
Education0.009 0.013 −0.005 0.665 0.207
Occupation0.019 0.022 −0.002 0.672 0.112
Industry0.026 0.013 0.013 1.128 0.536
Wage Structure Effects:
Union0.014 −0.002 0.015 0.442 0.360
Other−0.048 −0.034 −0.014−0.983−0.161
Education0.015 0.008 0.0071.444 0.188
Occupation0.057 −0.066 0.123 5.664 2.423
Industry−0.079 −0.048 −0.031 −3.212 −1.044
Constant0.079 0.030 0.049 0.2570.287
Total Effects:
Union0.030 −0.021 0.051 1.156 0.998
Other−0.029 −0.026 −0.0030.0010.312
Education0.024 0.022 0.0022.110 0.395
Occupation0.076 −0.045 0.121 6.336 2.534
Industry−0.054 −0.036 −0.018−2.084 −0.508
Note: Other includes non-white, non-married, and five categories of experience. Statistical signifiance levels: p ≤ 0.01, p ≤ 0.05, p ≤ 0.1. Bootstrapped standard errors over the entire procedure (500 replications) were used to compute the p-value. Trimmed sample for the variance and Gini drops 15 observations with hourly wages > $1,636 ($2010).
Table 4. Decomposition Results with Reweighting.
Table 4. Decomposition Results with Reweighting.
Inequality Measures90–1050–1090–50Variance (× 100)Gini (× 100)
Total Change0.125 −0.075 0.201 7.775 6.599
Composition0.090 0.038 0.052 4.193 1.966
Wage Structure0.030 −0.105 0.135 3.149 4.402
Composition Effects:
Union0.016 −0.019 0.035 0.712 0.638
Other0.019 0.009 0.011 1.007 0.481
Education0.007 0.013 −0.005 0.600 0.173
Occupation0.020 0.022 −0.002 0.719 0.129
Industry0.026 0.013 0.014 1.155 0.546
Specification Error0.002−0.0100.012 −0.308 0.175
Wage Structure Effects:
Union0.012 −0.005 0.017 0.338 0.220
Other−0.049 −0.026 −0.023−0.871−0.068
Education0.054 0.0100.045 2.303 1.183
Occupation0.018−0.075 0.093 2.872 1.416
Industry−0.094 −0.030 −0.064 −3.852 −1.306
Constant0.089 0.0220.067 2.359 2.957
Reweighting Error0.003 0.002 0.001 0.125 0.057
Total Effects:
Union0.029 −0.024 0.052 1.050 0.857
Other−0.029 −0.018 −0.0120.1350.413
Education0.062 0.022 0.039 2.903 1.356
Occupation0.038 −0.053 0.091 3.591 1.545
Industry−0.068 −0.017−0.051 −2.697 −0.760
Note: Other includes non-white, non-married, and five categories of experience. Statistical signifiance levels: p ≤ 0.01, p ≤ 0.05, p ≤ 0.1. Bootstrapped standard errors over the entire procedure (500 replications) were used to compute the p-value. Trimmed sample for the variance and Gini drops 15 observations with hourly wages > $1,636 ($2010).

Share and Cite

MDPI and ACS Style

Firpo, S.P.; Fortin, N.M.; Lemieux, T. Decomposing Wage Distributions Using Recentered Influence Function Regressions. Econometrics 2018, 6, 28. https://doi.org/10.3390/econometrics6020028

AMA Style

Firpo SP, Fortin NM, Lemieux T. Decomposing Wage Distributions Using Recentered Influence Function Regressions. Econometrics. 2018; 6(2):28. https://doi.org/10.3390/econometrics6020028

Chicago/Turabian Style

Firpo, Sergio P., Nicole M. Fortin, and Thomas Lemieux. 2018. "Decomposing Wage Distributions Using Recentered Influence Function Regressions" Econometrics 6, no. 2: 28. https://doi.org/10.3390/econometrics6020028

APA Style

Firpo, S. P., Fortin, N. M., & Lemieux, T. (2018). Decomposing Wage Distributions Using Recentered Influence Function Regressions. Econometrics, 6(2), 28. https://doi.org/10.3390/econometrics6020028

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop