Counterfactual Distributions in Bivariate Models—A Conditional Quantile Approach

This paper proposes a methodology to incorporate bivariate models in numerical computations of counterfactual distributions. The proposal is to extend the works of Machado and Mata (2005) and Melly (2005) using the grid method to generate pairs of random variables. This contribution allows incorporating the effect of intra-household decision making in counterfactual decompositions of changes in income distribution. An application using data from five latin american countries shows that this approach substantially improves the goodness of fit to the empirical distribution. However, the exercise of decomposition is less conclusive about the performance of the method, which essentially depends on the sample size and the accuracy of the regression model.


Introduction
Most empirical studies analyze the effects of income distribution determinants through decomposition methodologies based on Oaxaca-Blinder (1973) [2,3]. Those methodologies usually focus on the wage distribution of a single individual assuming that all employment decisions are made in an isolated or independent way with respect to other household members. Notwithstanding, the literature on intra-household labor supply shows several models of the interdependence in the employment decisions within the household. Assortative mating literature provides vast evidence of interrelations of individual variables among members, such as their education levels, labor income, the choice of hours of work, etc. Ignoring this feature when estimating household labor earnings on decomposition exercises provides a scenario that may be biased or unrealistic.
The main component of personal earnings is labor income. Therefore, it is important to know its intra-household determinants to understand the behavior of the household incomes and their consequences on inequality. In the most traditional model, there is a sole individual responsible for making labor decisions independently of other household members. However, in the case of complete households (with head and spouse) it is usual that this decision is made by the couple. There are several models in the literature where a couple faces the problem of deciding together their labor supply according to their interests within the home (e.g., Chiappori and Pierre-Andre, 1992 [4]; Blundell et al., 2005 [5]; van Klaveren et al., 2008 [6], among others). The main mechanisms behind this decision are the reservation wages of each member and the bargaining power that determines the share rule of the household income.
Given the complexity involved in analyzing the joint employment decisions of all household members, the usual alternative is to focus only on the decisions made by the household head and spouse. The implicit assumption is that the rest of the household members will not change their behavior, or at least their impact on family income is small. This assumption may be too simple, but it is a starting point used in the literature to understand the complex mechanisms interacting in the labor decisions made within the household. In particular, both the reservation wages and bargaining power depend on observable and unobservable characteristics of household members such as age, education status, persuasion, etc. Modeling both earnings equations to analyze household income distribution while taking into account their interactions requires a methodology that generates counterfactual distributions of hypothetical changes on their determinants. Some examples of models including employment decisions within the household are Browning et al. (1994) [7], Gasparini and Marchionni (2007) [8], Galiani and Weischenbaum (2012) [9], among others. Usually these studies make several parametric and/or distributional assumptions, such as normality in the unobservable income determinants. This approach could be too strict or it may not be quite representative of the actual income distribution. Another usual methodological aspect is that those papers use models focused on conditional means, relying on parametric assumptions other aspects of the distribution. Despite the progress of quantile regression literature allows exploring issues beyond the average effects, the bulk of the decompositions literature is based on counterfactual distributions of earnings equations for a single individual (i.e., Machado and Mata (2005) [10], Melly (2005) [11] and Firpo et al. (2009) [12]). This paper attempts to expand this literature by proposing a methodology to generate counterfactual scenarios on bivariate distributions using conditional quantiles.
The main contribution of this paper is to show that the problem of generating counterfactual income distributions for both household members is just an exercise of numerical integration involving a joint mechanism to generate a pair of random variables through their marginal distributions. Once this mechanism is established, it is possible to use the ranking association of both household members in order to get a set of replicates or realizations of the joint distribution. Nevertheless, the fact that incomes are related with observable characteristics makes necessary to introduce some structure to the conditional income distribution. Conditional quantiles are useful to model this matter for two reasons: first, they are the counterpart of the cumulative conditional distribution, and second they are easily estimable by standard methods. Quantile regressions allows an indirect way to capture the unobservable heterogeneous effects on each marginal distribution. Finally, the last step of the proposed method is to incorporate the relationship between the conditional incomes of both household members using a probabilistic association of conditional rankings.
The paper is organized as follows. In Section 2, a methodology to simulate bivariate random variable realizations based on marginal distributions is presented. Section 3 extends this idea to conditional joint distributions and its applications to counterfactual distributions. Section 4 shows an empirical application with household survey data for different countries in the Southern Cone of Latin America. Finally, Section 5 discusses the results and scope of the methodology.

Generating Random Variables
Generating random variables in the univariate case is relatively simple, and there are several methods available. The most widely used is the inverse cumulative function method: let U be a random variable with a uniform distribution U (0, 1), then the transformation F −1 Y (U ) generates a random variable with distribution F Y (y). Thus, this procedure simply consists on taking a realization of a uniform random variable u and then computing the u−th quantile Q Y (u) ≡ F −1 Y (u). In the case of integer variables, the logic is quite similar to the continuous case (Devroye, 1986) [13].
The bivariate setup is more complex because the statistical relationship between two variables must be considered. A closely related problem can be found in the study of copula functions. A copula is a function that links the joint distribution to the one-dimensional marginal distributions (Nelsen, 1999) [14]. As in the univariate case, there are several methods to create a bivariate random draw. For example, the conditional distribution method allows to generate a random vector (y 1 , y 2 ) using a vector (U 1 , U 2 ) of independent uniform random variables. Specifically, the method of the conditional distribution requires the following two steps: (1) compute y 2 = F −1 2 (U 2 ), where F 2 (.) is the marginal cumulative distribution of y 2 ; and (2) compute y 1 = F −1 (u 1 |y 2 ), that is, using the inverse of the cumulative distribution of y 1 conditional on y 2 . The key to this process is to know the exact functional form of the conditional distribution, which can be too strict in practice.
Another strategy that allows us to adapt the univariate methods (such as the inverse cumulative function) to the bivariate problem is the grid method. Before explaining this procedure, it is appropriate to give some definitions. The image of this function is Im(T ) = [M 1 + 1, M 2 1 + M 2 ]. The most interesting property of the encoder function is that it has a single value m for each ordered pair (m 1 , m 2 ). Therefore, each coordinate is identified by the following decoding function: Property 1. (decoding functions). Let m ∈ Im(T ) be a encoder function. Then, the coordinates m 1 and m 2 can be obtained from the decoding functions: The last element that is needed is to define the set of grids of an enclosure A ⊂ R 2 + .
Finally, consider two random variables (y 1 , y 2 ) ∈ A with a joint density function f (y 1 , y 2 ). To generate a realization of a vector (ỹ 1 ,ỹ 2 ) from the population distribution f (y 1 , y 2 ) we can use the grid method by following the next steps: Calculate the probability mass of each grid p m = P r[(y 1 , y 2 ) ∈ C m ] for every m. 3. Generate a realization of an integer univariate random variablem with probability distribution p m , calculated in the previous step. 4. Decodem to obtain the vector (m 1 ,m 2 ). 5. Compute the realization of (ỹ 1 ,ỹ 2 ) assigning values within the grid Cm.  The dotted lines delimit the grids subdividing the enclosure (i.e., the support of both random variables). Clearly, in the first case the probability mass (measured by the proportion of points falling into each grid) is concentrated in the diagonal given by the bisectrix, while in the second case there is no clear pattern for the joint probability. Logically, the greater the number of grids, the better the approach of the method (Hörmann et al., 2004 [15]). Therefore, the method incorporates the statistical relationship between y 1 and y 2 through the probability of each grid.
Lastly, note that the grids can be determined by their marginal quantile by defining the values . The validity of this equivalence is that the cumulative distribution is an increasing monotonic transformation of the random variable support. In other words, the F j (.) value represents the ranking position resulting from sorting the y j s increasingly. This establishes a one to one relationship between any value of y j and its ranking. Then, the grid C m can be written as:  The grid definition on the marginal probability plane is equivalent to define the grid in terms of the levels of both variables. Then, looking at the grid plane makes it possible to adapt this method to the context of conditional quantiles. This is a key idea because it is precisely the estimation target of the quantile regression technique. Briefly, building the link between the probability grids and the conditional quantiles allows us to associate the marginal rankings with the univariate method of random sampling for the purpose of generating counterfactual distributions. Furthermore, this strategy requires less information than the method of the conditional distribution given that it only requires to know the probability of each grid rather than an entire functional form for the distribution of y 1 conditional on y 2 .

Population
Consider now the distribution of (y 1 , y 2 ) depending on a group of covariates (x 1 , x 2 ). In particular, consider the following linear model for the pair of random variables (y 1 , y 2 ): where x 1 and x 2 are observable covariates vectors and the errors have a joint density f (ε 1 , ε 2 |x), with x ≡ (x 1 , x 2 ). Using the Skorohod's representation, the same model can be formulated under the conditional quantile form: where (θ 1 , θ 2 ) are two random variables whose domain is given by A ∈ [0, 1] × [0, 1]. Given the joint density f (ε 1 , ε 2 |x), the density of this transformation becames: Note that this function is the second derivative of the y 1 and y 2 copula conditional on x. The estimation of this object is not easy if we do not previously postulate some parametric assumptions (e.g., bivariate gaussian). While there are several available parametric forms for copulas, such as the Fréchet and Mardia families, our goal is to keep the nonparametric aspect that characterizes the quantile regression approach. However, it is unclear how the density estimation is useful for generating a sequence of random numbers to build counterfactual scenarios. 1 In this context, generating random values for a vector (y 1 , y 2 ) conditional on x appears as a simple extension of the grids method explained in the previous section.  ( m 1 , m 2 ). 5. Get realizations of the pair ( θ 1 , θ 2 ) assigning values within the grid C m . 6. Generate ( y 1 , y 2 ) using the pair ( θ 1 , θ 2 ) and the univariate method of inverse cumulative function y 1 = Q θ 1 (y 1 |x 1 ) and y 2 = Q θ 2 (y 2 |x 2 ).
So far, all the elements used in each step of the process come from the population and so they are unobservable for the econometrician. Thus, an estimation strategy is required. The next section discusses about this topic when we have a random sample available instead of population data.

Sample Estimation
To generate a replicate (y 1 , y 2 ) using a random sample we can apply the same procedure explained in the preceding paragraphs but replacing each element with its sample analogue. Specifically, we can estimate conditional quantiles Q θ j (y j |x j ) = x j β j (θ j ) for j = 1, 2 using a certain grid of values-e.g., θ = 0.05, 0.10, ..., 0.90, 0.95. The classic reference to get a consistent estimator of β 1 (θ 1 ) and β 2 (θ 2 ) is Koenker and Basset (1978) [16].
The grid method steps when we are working with sample estimators are: 1. Build C m using x 2i β 1 (a m ) and x 2i β 2 (b m ) as delimiters.
All the estimates used on each of the previous steps have good asymptotic properties (consistency) under the usual exogeneity assumption (Koenker, 2005) [17]. Moreover, if the number of cells M is large enough the grid method fits better. However, the number of different quantile regressions that can be estimated with a finite sample size is limited. Portnoy (1991) [18] shows that this number is O(n · log(n)). Nevertheless, this rate corresponds to the univariate case and to the best of our knowledge there is no study for the bivariate analysis. On the other hand, taking too many quantiles affects the consistency in the second step of the procedure because the probabilities of each grid are estimated with few observations. Therefore, there is a trade-off between the number of grids and the precision of the method. By the continuous mapping theorem, the method is expected to work well with relatively large sample sizes, provided that it allows to subdivide the enclosure into a greater number of grids.

Counterfactual Distributions
The proposed methodology can be used to generate counterfactual distributions due to a change on its determinants, as in Oaxaca-Blinder decompositions. Particularly, this proposal is in line with the literature initiated by Machado and Mata (2005) [10] and Melly (2005) [11]. Our contribution to this literature is to extend their method when there are two variables of interest (y 1 , y 2 ) or some function of them. For example, the distribution of household per capita income is the variable of interest in the vast majority of the studies about inequality and/or poverty. If y 1 and y 2 respectively represent the head and spouse individual incomes, then the household income is the sum of them, plus the income of the other family members. After calculating the total level of income received by each household, this number is divided by the number of members in the household to obtain the household per capita income.
Assuming that incomes from other family members and those coming from non-labor sources are constant, the distribution of household per capita income depends only on the determinants of the couples income. 2 Formally, let y t be the vector of household per capita income for all households observed in year t, then the income distribution can be represented as: That is, y t is a function of the parameters of each income equation at year t as well as of the probabilistic relationship between the two conditional rankings represented by r t (θ 1 , θ 2 ).
Let I(·) be any distributive indicator based on the vector of household incomes (e.g., Gini, Theil index, among others), then I(y t )−I(y s ) is the distributional change between the years t and s. A decomposition of this difference is an exercise of comparative statics where some income distribution determinants are changed and the others remain constant. The key is to build a set of counterfactual scenarios where some determinants are changed. The mechanism to do it is to generate replicates of the income distribution using the method explained in Section 2. For example, let's consider three counterfactual scenarios: In the first equation, only the parameters of the household head have changed and this represent the first scenario. In the second, only those of the spouse have been modified, while in the third both parameter sets have changed. Then, if I(y t ) − I(y s ) is the observed change in the distributive indicator, the effect of each scenario is: y 12 t -y s = D(β 1t (θ 1 ), β 2t (θ 2 ), r s (θ 1 , θ 2 )) − D(β 1s (θ 1 ), β 2s (θ 2 ), r s (θ 1 , θ 2 )) (5) 2 To incorporate non-labor income on a microsimulation exercise is not a simple task and depends mainly on the social policies applied in each country under analysis. See Badaracco (2014) [19] as an example for the countries in the Southern Cone of Latin America.
To ease notation, we have omitted the observable characteristics of the household head (x 1 ) and spouse (x 2 ). This obey to the fact that these determinants remain constant in our simulation exercise. Notwithstanding, our methodology admits counterfactual scenarios including isolated changes on those characteristics. In the terminology of Firpo et al. (2011) [20], the result of this exercise is called "characteristic effect"; while the scenarios proposed in Equations (3)-(5) are "parameter effects". The aim of this paper is to show the simulation methodology and the performance of implementing a simple exercise with different sample sizes. Therefore, to keep this analysis simple, only the counterfactual scenarios involving parameter changes were considered, separating the effect of both household members to explore the potential of the method.

Empirical Illustration
In this section we use real data as an application of the proposed methodology to generate counterfactual distributions of per capita household income. The model is defined by Equations (1) and (2) where y 1 and y 2 represent the labor earnings (in logs) of the household head and spouse, respectively. The vectors x 1 and x 2 are observed characteristics (age, education, gender, number of children) while ε 1 and ε 2 are terms representing the unobserved determinants of earnings. We focus our analysis on five countries in Latin America, particularly those belonging to the Southern Cone: Argentina, Brazil, Chile, Paraguay and Uruguay. The data come from household surveys collected by the statistical institutes of each country. 3 We use three alternative methods to estimate the earning models. The first method is to use a seemingly unrelated regression model (SUR), in which the parameters in Equations (1) and (2) are estimated by OLS but allowing correlation between the error terms in both equations. The second case is the estimation of a quantile regression model (IQR), in which the assumption is that the error terms are independent. Finally, the outcomes from these methods are compared with those obtained by applying the methodology of estimating through quantile regressions but relating the model equations using the grids method (DQR).
The first exercise is to analyze the performance of the proposed methodology (DQR) relative to the other two strategies (SUR and IDR). We use an ad-hoc rule to choose the number of quantiles on each earning equation. This rule ensures that the number of observations on each grid will be around 40 in the case in which both equations are independent. 4 The reason behind choosing this rule is to try to get reliable estimates of each grid without losing the asymptotic properties.
Using these three methods, the model's coefficients are estimated in order to generate the joint distribution of labor earnings of the heads and spouses in a particular year. These earnings are used to build a new household per capita income and compute the Gini coefficient. Table 1 shows the results. The first panel of the table shows the Gini coefficient observed in each country, followed by the Gini coefficient of the simulated income from each method. The standard errors of each coefficient are computed using 50 simulation replicates. Errors in the SUR model are generated 50 times from a bivariate normal distribution using all the estimated parameters. For the IQR, uniform random variables are generated independently, so that there is no conditional ranking association. Finally, under the DQR simulation, the values of the labor earnings of the head and spouse are obtained from the estimated probabilities in the grid method, namely considering the relationship between the two equations. The second panel in the table presents the mean square error (MSE) of each estimate with respect to the empirical distribution: Note: Standard errors in parentheses. The observed Gini coefficient corresponds to the initial year (see Table A1). The number of observations corresponds to the sample of households that have both head and spouse in the initial year.
The SUR method has the lowest MSE for Argentina and Paraguay, followed closely by the DQR method. Therefore, in these two cases, using the conditional mean with an assumption of normality in errors fits relatively well to the real data. In the cases of Brazil, Chile and Uruguay the method that achieves the lowest MSE is the DQR, followed by the IQR method. As discussed above, these results suggest that the DQR method requires a certain amount of observations to achieve relatively good performance. However, in the case of Uruguay, which has a smaller sample than Argentina, DQR method has the lowest MSE. Then, this methodology may also depend on how well the model fits to the empirical distribution. However, large sample sizes should improve the approximation of the DQR method.
The next step in this section is to perform the micro-decomposition discussed in Section 3.3 in order to compare the results obtained with the three methods. As an illustration, we estimate the parameters effect in the equations of labor earnings. Table 2    The greatest discrepancies among methodologies belong to the SUR method, while the IQR and DQR do not differ significantly from each other. The differences between the DQR and IQR are between 0 and 0.1 points (in absolute value) in the countries where the DQR achieves the lowest MSE. This result suggests a significant difference in terms of effects (between 0% and 30% in some cases). However, the economic significance of these differences is small (one tenth point of the Gini). The case of Paraguay shows a potential weakness in the DQR method when there are too few observations available. Since DQR has the lowest MSE in this country, the differences with the other methods suggest that with small samples the DQR method could present a potential bias in the estimated effects.

Conclusions
This paper proposes a method to incorporate the intra-household relationship between the labor incomes of the head and the spouse in decomposition studies. The paper closely follows the articles of Machado and Mata (2005) [10] and Melly (2005) [11]. We try to extend these papers by incorporating the correlation of intra-household income modeled by a simultaneous equation system. The key idea in our proposal is to associate conditional quantiles by adapting an standard method for generating random variables: the grid method.
The complexity associated with the joint employment decisions in a household leads us to focus our analysis on the behavior of the head and spouse, independently of the decisions of the rest of the family members. Furthermore, our model only analyzes the determination of labor earnings, assuming all other sources of income remain unchanged. Incorporating these other sources is an exercise that does not allow certain generalizations because non-labor income depends mainly on the social policies applied in each country (Badaracco, 2014 [19]).
An empirical application performing a simple decomposition exercise was implemented by using data from household surveys for the Southern Cone countries in Latin America. The counterfactual scenarios considered consisted on a change in the parameters in the labor earnings equations in two different moments in time. The results show that, in general, incorporating the interaction of household incomes substantially improves the goodness of fit to the empirical income distribution. Also, using quantile regression can dramatically change the results of the simulation exercise. However, although the introduction of correlation in incomes yields different results, the economic significance seems to be minor. The comparative exercise among different surveys shows that the performance of the method clearly depends on the sample size by limiting the number of grids. Moreover, given the sample size, the goodness of fit of the semiparametric method seems to be another key point.
The paper omits some important issues related to the estimation of earnings equations such as sample selection and endogeneity of covariates (e.g., education). The main reason for doing this is that our target is to propose a methodology for the generation of counterfactual distributions, showing their application using standard regression methods developed in the literature. Solving all these problems requires the use of more specific methodologies that are still under development such as those in Buchinsky (2001) [21] and Chernozhukov and Hansen (2006) [22]. Exploring the performance of the proposed method under these estimation techniques is postponed for future research.