Next Article in Journal
HAR Testing for Spurious Regression in Trend
Previous Article in Journal
Jointly Modeling Autoregressive Conditional Mean and Variance of Non-Negative Valued Time Series
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Causal Random Forests Model Using Instrumental Variable Quantile Regression

1
Institute for International Strategy, Tokyo International University, 1-13-1 Matobakita Kawagoe, Saitama 350-1197, Japan
2
Center for Research in Econometric Theory and Applications, National Taiwan University, No. 1, Section 4, Roosevelt Road, Taipei 10617, Taiwan
3
Behavioral and Data Science Research Center, National Taiwan University, No. 1, Section 4, Roosevelt Road, Taipei 10617, Taiwan
*
Author to whom correspondence should be addressed.
Econometrics 2019, 7(4), 49; https://doi.org/10.3390/econometrics7040049
Submission received: 24 September 2019 / Revised: 8 December 2019 / Accepted: 11 December 2019 / Published: 16 December 2019

Abstract

:
We propose an econometric procedure based mainly on the generalized random forests method. Not only does this process estimate the quantile treatment effect nonparametrically, but our procedure yields a measure of variable importance in terms of heterogeneity among control variables. We also apply the proposed procedure to reinvestigate the distributional effect of 401(k) participation on net financial assets, and the quantile earnings effect of participating in a job training program.

1. Introduction

Causal machine learning, which is based on two approaches: the double machine learning (DML), cf. Chernozhukov et al. (2018), and the generalized random forests method (GRF), cf. Athey et al. (2019), has been actively studied in economics in recent years. With the identification strategy of selection on observables, empirical applications have been investigated by using the aforementioned two approaches, including the works by Gilchrist and Sands (2016) and Davis and Heller (2017). When it comes to the identification strategy of selection on unobservables, few empirical papers using causal machine learning can be found in the existing literature. Those empirical applications very often lack important observed control variables or involve reverse causality, and thus researchers resort to the instrumental variable approach. Additionally, it remains unclear how the quantile treatment effect is to be estimated under the DML and GRF methods. In this paper, with the use of instrumental variables, we propose an econometric procedure for estimating quantile treatment effects based primarily on the generalized random forests of Athey et al. (2019).
Chernozhukov and Hansen (2005) propose an estimator that addresses endogeneity in quantile regressions via rank similarity, a crucial feature absent in the prior approaches. Using rank similarity, this estimator studies the heterogeneous quantile effects of an endogenous variable over the entire population (rather than for the compliers). Rank similarity thus identifies population-based quantile treatment effects, cf. Frandsen and Lefgren (2018). This approach does not require the monotonicity assumption used in Abadie et al. (2002) and allows for binary or continuous endogenous and instrumental variables. Chernozhukov and Hansen (2008) create a bridge between two-stage least squares (2SLS) estimator and their 2005 estimator, and propose an estimator robust to weak instruments. However, it is noteworthy that these estimator are unable to estimate unconditional quantiles, which are, as discussed in Guilhem et al. (2019), quantities that should be of utmost interest to empirical researchers. In this paper, we use the instrumental variable quantile regression of Chernozhukov and Hansen (2008) as a vehicle for identifying the quantile treatment effect.
Athey and Imbens (2016) is the first paper that develops the regression tree model to estimate heterogeneous treatment effects using the honest splitting algorithm. Wager and Athey (2018) extend the regression tree model to causal forests. Recently, Athey et al. (2019) have developed the generalized random forests model, which is a unified framework in the sense that it is built on local moment conditions capable of encompassing many models. Therefore, we bring the first order condition of the instrumental variable quantile regression into the local moment conditions and then modify the GRF algorithm. Accordingly, the quantile treatment effect can be estimated under the framework of causal random forests. Thus, our proposed estimator and the generalized random forests model both share the advantage of estimating the conditional quantile treatment effect nonparametrically.
Chen and Tien (2019) investigate the instrumental variable quantile regression in the context of double machine learning. Although related to their paper, our procedure is not considering the same high-dimensional setting. Further, in contrast to the DML for instrumental variable quantile regressions, the proposed econometric procedure yields a measure of variable importance in terms of heterogeneity among control variables. The pattern of variable importance across quantiles can be revealed as well. We highlight the usage of exploring variable importance by reinvestigating two empirical studies - the distributional effect of 401(k) participation on net financial assets, and the quantile effect of participating in a job training program on earnings.
The rest of the paper is organized as follows. The model specification and practical algorithm are introduced in Section 2. Section 3 presents the measure of variable importance. Section 4 presents two empirical applications. Section 5 concludes the paper. The Appendix A discusses the usage of a doubly robust method along with the causal random forests structure for achieving more efficient estimation. The Appendix A also discusses the identifying restrictions and regularity conditions for the instrumental variable quantile regression and the generalized random forests, and further verifies conditions for establishing consistency and asymptotic normality of the proposed estimator.

2. The Model and Algorithm

We propose the causal random forests with the instrumental variable quantile regression (GRF-IVQR, hereafter). Estimation procedure of the GRF-IVQR is constructed as below, essentially based on the method developed in Athey et al. (2019).

2.1. Generalized Random Forests

The classification tree and regression tree (CART) and its extension, random forests Breiman (2001), are effective methods for flexibly estimating regression functions in terms of out-of-sample predictive power. Random forests have become particularly popular methods. A key attraction is that they require relatively little tuning and have superior performance to more complex methods such as deep learning neural networks, cf. Section 3.2 of Athey and Imbens (2019). Recently, random forests have garnered interest and have been extended to causal effects; that is, the generalized random forests estimator.
In what follows, we describe how we incorporate the instrumental variable quantile regression into the framework of GRF and modify the resulting estimator accordingly.
Given data ( X i , O i ) X × O , we estimate the parameter of interest θ ( x ) via the following moment conditions
E [ ψ θ ( x ) , ν ( x ) ( O i ) X i = x ] = 0 for all x X ,
where ψ ( · ) stands for the score function and ν ( x ) are optional nuisance parameters. The above moment conditions, similar to the generalized method of moments (GMM), can be used to identify many objects of interest from an economic perspective. We seek forest-based estimates, θ ^ ( x ) , which are the conditional quantile treatment effects, in the context of instrumental variable quantile regressions.
Chernozhukov and Hansen (2005) laid the theoretical foundations for the instrumental variable quantile regression (IVQR). With outcome Y i , endogenous treatment variable D i , instrumental variable Z i , and control variables X i , the IVQR can be represented as the following moment conditions
E ψ θ ( τ ) , ν ( τ ) ( Y i ) | { D i , X i , Z i } = E τ 1 Y i D i θ ( τ ) + X i ν ( τ ) ( Z i , X i ) | { D i , X i , Z i } ,
where θ ( τ ) is the conditional quantile treatment effect, ν ( τ ) are the nuisance parameters, 1 ( · ) is the indicator function, and τ is a quantile index.
The sample counterpart of the local moment conditions and the estimator of θ are introduced by Athey et al. (2019) and defined as below.
θ ^ ( τ , x ) , ν ^ ( τ , x ) argmin θ ( τ ) , ν ( τ ) i = 1 n α i ( x ) ψ θ ( τ ) , ν ( τ ) ( Y i ) 2 ,
where α i ( x ) are tree-based weights averaged by the forest, which measure how often each training example falls in the same leaf as x. In other words, these weights represent the relevance of each sample when we estimate θ . Specifically, the weights are obtained by a forest-based algorithm. For the point of interest x, let L b ( x ) represent the set of samples which fall in the same terminal leaf and contain x in bth tree, where b { 1 , 2 , . . . , B } . That is to say, the weight α i ( x ) of each sample for the point of interest x will be the frequency with which the ith sample is in the same terminal leaf among all trees { 1 , 2 , . . . , B } . That is,
α b i ( x ) = 1 ( X i L b ( x ) ) | L b ( x ) | , α i ( x ) = 1 B b = 1 B α b i ( x ) .
With such forest-based weights and a pre-specified quantile index τ , we minimize the criterion function constructed using sample moment conditions, and then an estimate of the conditional quantile treatment effect θ ^ ( τ ) is obtained. In the subsequent section, we discuss how to grow the trees and the forests with the instrumental variable quantile regression.

2.2. Tree Splitting Rules

Growing a tree is a recursive binary splitting process. The spirit of the tree-based algorithm is to split the data in the parent node P in half by maximizing the heterogeneity of the associated two children nodes { C 1 , C 2 } .
Specifically, for node j with data J , we define the node parameters as follows.
θ ^ j ( τ ) , ν ^ j ( τ ) ( J ) argmin θ ( τ ) , ν ( τ ) i J : X i j ψ θ ( τ ) , ν ( τ ) ( Y i ) 2 ,
where j { P , C } . In each node, we minimize the following criterion
err ( C 1 , C 2 ) = j = 1 , 2 P [ X C j X P ] · E θ ^ C j ( τ ) θ ( τ , X ) 2 | X C j ,
which is based on the GRF method. However, the minimization is infeasible due to the unknown value of θ ( τ , X ) . Athey et al. (2019) turn this minimization problem of err ( C 1 , C 2 ) into an accessible model-free maximization problem of
Δ ( C 1 , C 2 ) : = n C 1 n C 2 n P 2 θ ^ C 1 ( τ ) θ ^ C 2 ( τ ) 2 ,
where n C 1 , n C 2 , n P are numbers of observations in children and parent nodes. Along the way of maximizing Δ , the θ C j ( τ ) is estimated by the IVQR with respect to all possible splittings which correspond to the set { all possible values for X i , i } . Consequently, the estimation is computationally infeasible. To circumvent this difficulty, Athey et al. (2019) suggest a gradient tree algorithm which maximizes an approximate criterion Δ ˜ ( C 1 , C 2 ) . In what follows, with two new ingredients A p and ρ defined below, we construct Δ ˜ ( C 1 , C 2 ) step by step.
We first define A p as the gradient of the expectation of the moment condition.
A p = E ψ θ ^ P ( τ ) , ν ^ P ( τ ) ( Y i ) | { D i , X i , Z i } P = E τ 1 Y i D i θ ^ P ( τ ) + X i ν ^ P ( τ ) ( Z i , X i ) | { D i , X i , Z i } P = τ F D i θ ^ P ( τ ) + X i ν ^ P ( τ ) ( Z i , X i ) | { D i , X i , Z i } P = τ Z i F D i θ ^ P ( τ ) + X i ν ^ P ( τ ) Z i τ X 1 i F D i θ ^ P ( τ ) + X i ν ^ P ( τ ) X 1 i τ X m i F D i θ ^ P ( τ ) + X i ν ^ P ( τ ) X m i | { D i , X i , Z i } P ,
where F ( · ) is a cumulative distribution function, and m is the dimension of X. For simplicity of derivation, we fix the following notations. g 0 : = τ Z i F D i θ ^ P ( τ ) + X i ν ^ P ( τ ) Z i , g 1 : = τ X 1 i F D i θ ^ P ( τ ) + X i ν ^ P ( τ ) X 1 i , ⋯, g m : = τ X m i F D i θ ^ P ( τ ) + X i ν ^ P ( τ ) X m i , and we suppress [ · | { D i , X i , Z i } P ] which means the estimation is conditional on the parent node. Accordingly, A p can be written as the gradient of g 0 , g 1 , , g m with respect to the parent node parameters.
A p = θ ^ P ( τ ) , ν ^ P ( τ ) g 0 g 1 g m = g 0 θ ^ P ( τ ) g 1 θ ^ P ( τ ) g 2 θ ^ P ( τ ) g m θ ^ P ( τ ) g 0 ν ^ 1 , P ( τ ) g 1 ν ^ 1 , P ( τ ) g 2 ν ^ 1 , P ( τ ) g m ν ^ 1 , P ( τ ) g 0 ν ^ m , P ( τ ) g 1 ν ^ m , P ( τ ) g 2 ν ^ m , P ( τ ) g m ν ^ m , P ( τ ) = f D i θ ^ P ( τ ) + X i ν ^ P ( τ ) Z i D i f D i θ ^ P ( τ ) + X i ν ^ P ( τ ) X 1 i D i f D i θ ^ P ( τ ) + X i ν ^ P ( τ ) X m i D i f D i θ ^ P ( τ ) + X i ν ^ P ( τ ) Z i X 1 i f D i θ ^ P ( τ ) + X i ν ^ P ( τ ) X 1 i X 1 i f D i θ ^ P ( τ ) + X i ν ^ P ( τ ) X m i X 1 i f D i θ ^ P ( τ ) + X i ν ^ P ( τ ) Z i X m i f D i θ ^ P ( τ ) + X i ν ^ P ( τ ) X 1 i X m i f D i θ ^ P ( τ ) + X i ν ^ P ( τ ) X m i X m i = f D i θ ^ P ( τ ) + X i ν ^ P ( τ ) Z i D i X 1 i D i X m i D i Z i X 1 i X 1 i X 1 i X m i X 1 i Z i X m i X 1 i X m i X m i X m i ,
where f ( · ) is the probability density function of F ( · ) . Therefore, the inverse of A p ,
A p 1 = f D i θ ^ P ( τ ) + X i ν ^ P ( τ ) Z i D i X 1 i D i X m i D i Z i X 1 i X 1 i X 1 i X m i X 1 i Z i X m i X 1 i X m i X m i X m i 1 = 1 f D i θ ^ P ( τ ) + X i ν ^ P ( τ ) Z i D i X 1 i D i X m i D i Z i X 1 i X 1 i X 1 i X m i X 1 i Z i X m i X 1 i X m i X m i X m i 1 .
We then construct the pseudo-outcomes,
ρ i = ξ A p 1 ψ θ ^ P ( τ ) , ν ^ P ( τ ) ( Y i ) ,
where ξ is a vector that picks out the θ ( τ ) -coordinate from the ( θ ( τ ) , ν ( τ ) ) vector. In the case with one treatment variable D, the ξ vector is ( 1 , 0 , . . . , 0 ) . Thus, the corresponding pseudo-outcomes are
ρ i = 1 0 0 1 f D i θ ^ P ( τ ) + X i ν ^ P ( τ ) 1 # C i C ( Z i D i ) 1 # C i C ( X 1 i D i ) 1 # C i C ( X m i D i ) 1 # C i C ( Z i X 1 i ) 1 # C i C ( X 1 i X 1 i ) 1 # C i C ( X m i X 1 i ) 1 # C i C ( Z i X m i ) 1 # C i C ( X 1 i X m i ) 1 # C i C ( X m i X m i ) 1 × τ 1 Y i D i θ ( τ ) + X i ν ( τ ) ( Z i , X i ) = 1 f D i θ ^ P ( τ ) + X i ν ^ P ( τ ) 1 0 0 1 # C i C ( Z i D i ) 1 # C i C ( X 1 i D i ) 1 # C i C ( X m i D i ) 1 # C i C ( Z i X 1 i ) 1 # C i C ( X 1 i X 1 i ) 1 # C i C ( X m i X 1 i ) 1 # C i C ( Z i X m i ) 1 # C i C ( X 1 i X m i ) 1 # C i C ( X m i X m i ) 1 × τ 1 Y i D i θ ( τ ) + X i ν ( τ ) Z i X 1 i X m i ,
where the # C denotes the number of observations in the children node C. The splitting rule is to maximize the following approximate criterion
Δ ˜ ( C 1 , C 2 ) = j = 1 2 1 | { i : X i C j } | { i : X i C j } ρ i 2 .
Notice that since some terms in ρ i , such as f ( · ) , do not affect the optimization of Δ ˜ ( C 1 , C 2 ) , the ρ i can be further simplified as follows.
ρ i = 1 0 0 1 # C i C ( Z i D i ) 1 # C i C ( X 1 i D i ) 1 # C i C ( X m i D i ) 1 # C i C ( Z i X 1 i ) 1 # C i C ( X 1 i X 1 i ) 1 # C i C ( X m i X 1 i ) 1 # C i C ( Z i X m i ) 1 # C i C ( X 1 i X m i ) 1 # C i C ( X m i X m i ) 1 × 1 Y i > D i θ ( τ ) + X i ν ( τ ) Z i X 1 i X m i .
Using the modified ρ i above, Δ ˜ ( C 1 , C 2 ) is our splitting rule for the instrumental variable quantile regression within the framework of generalized random forests. Based on the splitting rule, the tree is grown by recursively partitioning the data until a stopping criterion is met, cf. Section 2.4.

2.3. The Algorithm and an Example Illustrating Weights Calculation

With the splitting rule established, we can now grow the entire forests. In Athey and Imbens (2016) and Wager and Athey (2018), the concept of honest estimation is introduced, which is also included in the generalized random forests model. A model is honest if the information for the model construction and estimation is not the same. In the tree-forming case, the honesty here is consider as a sub-sample splitting between tree forming and weight calculation.
Here is an example of the implementation of honest estimation. Suppose we have eight samples in our data J, where J = { a , b , c , d , e , f , g , h } . We split the sample in half honestly, and we have two sub-samples J 1 = { a , b , c , d } for tree forming and J 2 = { e , f , g , h } weight calculation. By the splitting rule, we can construct the following tree with J 1 = { a , b , c , d } ,
Econometrics 07 00049 i001
Next, we identify where the data of J 2 = { e , f , g , h } is located in the tree.
Econometrics 07 00049 i002
Then we use this information to calculate the frequency and obtain the weights. Suppose we do not have any out of sample points of interest, we use each of the eight samples as point of interest, one at a time. If the point of interest is { a } , since a is in the same leaf with { e , f , g } , samples { e , f , g } each gets 1 3 of weight, { a , b , c , d , h } get 0 of weight. If the point of interest is { b } , since b is in the same leaf with { h } , sample { h } gets 1 of weight, { a , b , c , d , e , f , g } get 0 of weight. By utilizing this method, we can get the weight for all data points. The following is the weight matrix for the above 1-tree model,
Point of interestabcdefgh
Weight for sample a00000000
Weight for sample b00000000
Weight for sample c00000000
Weight for sample d00000000
Weight for sample e 1 3 0 1 3 0 1 3 1 3 1 3 0
Weight for sample f 1 3 0 1 3 0 1 3 1 3 1 3 0
Weight for sample g 1 3 0 1 3 0 1 3 1 3 1 3 0
Weight for sample h01010001
Athey et al. (2019) prove that with proper honest sub-sampling rate and regularity conditions, the generalized random forests estimator θ ^ ( x ) is consistent and asymptotic normal to θ ( x ) .
To build the random forests with honest tree, we first randomly select 1 2 of sample for each tree. Then in each tree, we use 1 2 subsampling rate for honest splitting. For the average quantile treatment effect, we adopt each point in the data as the point of interest using their own weights one by one and get the average of all results.

2.4. Practical Implementation

When implementing the generalized random forests algorithm, we first obtain baseline grids through the conventional IVQR estimator, and then utilize those grids to grow the tree. With the IVQR estimator γ ^ pre and its standard error σ ^ pre , we construct the interval [ γ ^ pre 3 σ ^ pre , γ ^ pre + 3 σ ^ pre ] . We divide this interval into 100 equal parts, and then obtain the baseline grid
baseline grid = γ ^ pre 3 σ ^ pre , , γ ^ pre + 3 σ ^ pre .
For tree b { 1 , 2 , . . . , B } in the random forest estimation, half of the data is randomly selected. Consequently, we should reconstruct the grid for each tree. Similarly, we build the grid for the tree b
grid for the tree b = γ ^ tree b 3 σ ^ tree b , , γ ^ tree b + 3 σ ^ tree b ,
which is obtained via the randomly selected half of data in the tree b.
Following the concept of honest estimation, we further split the data into two parts denoted as data J 1 and J 2 . Data J 1 is used to grow the tree, and data J 2 is used to form the weight α i . As to grow the tree with data J 1 , in what follows, we outline the splitting process in each node. We first estimate the parent node parameters θ ^ P ( τ ) and ν ^ P ( τ ) by optimizing
θ ^ P ( τ , x ) , ν ^ P ( τ , x ) argmin θ P ( τ ) , ν P ( τ ) data in parent node ψ θ ( τ ) P , ν ( τ ) P ( Y i ) 2
with the grid for the tree b. We then implement the splitting criterion
max Δ ˜ ( C 1 , C 2 ) = j = 1 2 1 | { i : X i C j } | { i : X i C j } ρ i 2
for every split.
The tree keeps splitting recursively until they reach the minimum-node-size constraint or a situation that the data in the parent node has little variation, therefore further splitting is infeasible. These two practical stopping criteria on splitting suffice for reasonable estimates.
Regarding estimation of the weight, we first identify where the observations in J 2 will be located in the tree constructed by the data J 1 . Using the algorithm discussed in the Section 2.3, we compute the weight for every data point. Accordingly, we have determined the estimation of growing a tree b.
By growing a total of B trees and averaging the weight in each tree, we obtain the weight of each observation. With the weight α i ( x ) , we estimate the conditional local quantile treatment effect
θ ^ ( τ , x ) , ν ^ ( τ , x ) argmin θ ( τ ) , ν ( τ ) i = 1 n α i ( x ) ψ θ ( τ ) , ν ( τ ) ( Y i ) 2 .
To yield the local quantile treatment effect, we could average all x-pointwise conditional local quantile treatment effects. However, the averaging procedure can be further modified to get more efficient estimates, which is discussed in the Appendix A. Nevertheless, our empirical studies in Section 4 suggest that with a proper sampled data, the aforementioned practical procedure performs substantially well.

3. Variable Importance

Athey et al. (2019) and the associated grf R package develop a measurement for sorting variable importance which is a unique advantage of tree-based models. To explore the variable importance across quantiles, we adopt their measure of importance reproduced as follows.
Importance i = l = 1 max . depth b = 1 B number of splitting in layer l for x i in tree b b = 1 B total number of splitting in layer l in tree b · l 2 l = 1 max . depth l 2 ,
where the number of maximum depth is pre-specified by empirical researchers. Specifically, this measure of variable importance only considers the splitting frequency for variable X i in trees b = 1 , . . . , B .
This version of importance measurement shares similarity with the Gini importance widely used in random forests. Therefore, both algorithms prefer continuous variables since they have more potential splitting chances compared to binary variables. We thus shall be cautious when interpreting variable importance between a continuous variable and a categorical variable. Another important remark is that we should not conclude a particular covariate is unrelated to treatment effects simply because the tree did not split on it. There can be many different ways to pick out a subgroup of units with high or low treatment effects. Thus by comparing the average characteristics of units with high treatment effects to those with low treatment effects, researchers could obtain a fuller picture of the differences between these groups across all covariates.
Similar to the R-squared, variable importance signifies whether a variable yields enough explanatory power to the outcome variable in light of variation. Variable importance can also be used for model selection. In recent literature, e.g., O’Neill and Weeks (2018), researchers adopt variable importance measurement for policy making. Given hundreds of variables, the forest-based algorithm picks out important variables, which suffices for policy makers to identify their benchmark models.

4. Empirical Studies

In this section, we reinvestigate two empirical studies on quantile treatment effects: the effect of 401(k) participation on wealth, cf. Chernozhukov and Hansen (2004), and the effect of job training program participation on earnings, cf. Abadie et al. (2002). Not only does this conduct data-driven robustness checks on the econometric results, but the GRF-IVQR yields a measure of variable importance in terms of heterogeneity among control variables. This complements the existing empirical findings. In addition, we compare our empirical results with those from Chen and Tien (2019), the IVQR estimation based on the double machine learning approach, which is an alternative in causal machine learning literature.
As a critical note, we do not estimate and report the conditional quantile treatment effect (CQTE) in the applications. When the outcome level has an impact on the effect size and the conditional outcome variable are heterogeneous, then the CQTE could report spurious heterogeneity; see comprehensive summary of the problem in Strittmatter (2019). The same problem carries through to the importance measure. Therefore, the variable importance has to be interpreted with caution across different quantiles.

4.1. The 401(k) Retirement Savings Plan

Examining the effects of 401(k) plans on accumulated wealth is an issue of long-standing empirical interest. For example, based on the identification of selection on observables, Chiou et al. (2018) and Chernozhukov and Hansen (2013) suggest that the income nonlinear effect exists in the 401(k) study. Nonlinear effects from other control variables are identified as well. Few papers, however, investigate variable importance among control variables, cf. Chen and Tien (2019). In addition to estimating the quantile treatment effect of 401(k) participation, we fully explore variable importance across the conditional quantiles of accumulated wealth in light of the generalized random forests. The corresponding findings shed some light on the existing literature.
The data with 9915 observations are from the 1991 Survey of Income and Program Participation. The outcome variable is the net financial asset. The treatment variable is a binary variable standing for participation in the 401(k) plan. The instrument is an indicator for being eligible to enroll in the 401(k) plan. Control variables consist of age, income, family size, education, marital status, two-earner status, defined benefit pension status, individual retirement account (IRA) participation status, and homeownership status, which follow the model specification used in Chernozhukov and Hansen (2004).
Table 1 signifies that the quantile treatment effects estimated by the GRF-IVQR are similar to those calculated in Chernozhukov and Hansen (2004). The 401(k) participation has larger positive effects on net financial assets for people with higher savings propensity which corresponds to the upper conditional quantiles. The estimated treatment effects show a monotonically increasing pattern across the conditional distribution of net financial assets. Thus, the pattern identified by Chernozhukov and Hansen (2004) is assured through our data-driven robustness checks.
Based on the measure of variable importance introduced in Section 3, Table 2 and Figure 1 depict that income, age, education, and family size are the first four important variables in the analysis1. On average, income and age are the most important variables accounting for heterogeneity, which lead to values of the variable importance 64.4% and 15.6%, respectively. We should interpret the variable importance measure with caution, because researchers could reduce the importance measure of one variable by adding a highly correlated additional variable to the model. Accordingly, in this case, the two highly correlated variables have to share the sample splits. However, even with the caution mentioned above, we now have an additional dimension, τ , which suffices to compare variable importance across quantiles. Particularly, the importance of age variable increases as the savings propensity (quantile index) goes up. The importance of income variable, however, decreases across conditional distribution of net financial assets. In addition, these four variables are also identified as important in the context of double machine learning, cf. Chen and Tien (2019).

4.2. The Job Training Program

Abadie et al. (2002) use the Job Training Partnership Act (JTPA) data to estimate the quantile treatment effect of job training on the earning distribution2. The data is from Title II of the JTPA in the early 1990’s, which consists of 11,204 samples, 5102 of which are male, and 6102 of which are female. In the estimation, they take 30-month earnings as the outcome variable, enrollment for JTPA service as the treatment variable, and a randomized offer of JTPA enrollment as the instrumental variable. Control variables include education, race, marital status, previous year work status, job training service strategies, age, and whether earnings data is from the second follow-up survey. In the female group, an additional control, aid to families with dependent children (AFDC), is added. We follow the same model specifications when estimating the GRF-IVQR.
Table 3 and Table 4 show that for females, job training program generates a significantly positive treatment effect on earnings at 0.5 and 0.75 quantiles. GRF-IVQR signifies similar results.
For the male group, Table 5 and Figure 2 depicts that work less than 13 weeks (wlkess13) and on-the-job training and/or job search assistance (ojt_jsa) are the most important variables. However, there is no apparent pattern suggesting that variable importance differs across quantiles. The pattern of variable importance resulting from the GRF-IV and the GRF-IVQR are different as well.
As to the GRF-IVQR, in Table 5, the variance importance for Hispanics and the age group 45 to 54 are 0 across all quantiles, while the GRF-IV suggests these two variables are of some importance. Possible explanations are as follows. Compared to the GRF-IVQR’s moment condition, the GRF-IV still performs well in nodes with a relatively small amount of data. Consequently, the GRF-IVQR is more restrictive for growing a shallower tree than the GRF-IV. Therefore, some variables used to make a split in deeper nodes will not be chosen by the GRF-IVQR algorithm. Besides, at deeper nodes, the data is very similar in each node. Specifically, this situation occurs frequently with a large number of binary variables, and thus leads to no variation in a certain variable. Therefore, in practical estimation, the GRF-IVQR grows a relatively small tree.
For the female group, Table 6 and Figure 3 depicts that classroom training (class_tr) and on-the-job training and/or job search assistance (ojt_jsa) are the most important variables. The importance of on-the-job training and/or job search assistance decreases across quantiles, which is different from the pattern in the male group. The issue concerning no variation of a binary variable in deeper nodes becomes severe in the female group.
The variance importance for Hispanics and several age binary variables are 0 across all quantiles, which indicates that in the female group, the aforementioned characteristics variables are more homogeneous over the conditional distribution of earnings.

5. Conclusions

Based on the generalized random forests of Athey et al. (2019), we propose an econometric procedure to estimate the quantile treatment effect. Not only does this method estimate the treatment effect nonparametrically, but our procedure yields a measure of variable importance, in terms of heterogeneity among control variables. We provide the practical algorithm and the associated R codes. We also apply the proposed procedure to reinvestigate the distributional effect of 401(k) participation on net financial assets, and the quantile effect of participating a job training program on earnings. Income, age, education, and family size are identified as the first-four important variables in the 401(k) analysis. In the job training program example, our procedure suggests that the previous year work status and the job training service strategies are important control variables.

Author Contributions

Both authors contributed equally to the paper.

Funding

This research was partly funded by the personal research fund from Tokyo International University, and financially supported by the Center for Research in Econometric Theory and Applications (Grant no. 107L900203) from The Featured Areas Research Center Program within the framework of the Higher Education Sprout Project by the Ministry of Education (MOE) in Taiwan.

Acknowledgments

We are grateful to the two anonymous referees for their constructive comments that have greatly improved this paper. We thank Patrick DeJarnette, Masaru Inaba, Min-Jeng Lin and Shinya Tanaka for discussions and comments. This paper has benefited from presentation at the Aoyama Gakuin University, Kansai University, and 2019 Annual Conference of Taiwan Economic Association. The usual disclaimer applies.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
DMLDouble machine learning
GRFGeneralized random forests
IVQRInstrumental variable quantile regression

Appendix A

Appendix A.1. Improving Efficiency by Doubly Robust Estimators

Chernozhukov et al. (2018) and Athey and Wager (2018) pioneered the use of the doubly robust estimator embedded in a framework of causal machine learning. The resulting estimator becomes more accurate and gains efficiency. In light of their idea, it might be beneficial to incorporate the doubly robust estimation in our methodology.
A doubly robust augmented inverse propensity weighted (AIPW) estimator was introduced by Robins et al. (1994). The AIPW estimator for average treatment effect is constructed by two components as follows.
ATE ^ AIPW = 1 n i = 1 n { D i Y i e ^ ( X i ) ( 1 D i ) Y i 1 e ^ ( X i ) ( D i e ^ ( X i ) ) e ^ ( X i ) ( 1 e ^ ( X i ) ) ( 1 e ^ ( X i ) ) E ^ ( Y i | D i = 1 , X i ) + e ^ ( X i ) E ^ ( Y i | D i = 0 , X i ) } ,
where e ( x ) = P [ D i | X i = x ] being the propensity score. The first line in the equation represents the inverse probability weighted estimator, and the second line depicts a weighted regression. The AIPW estimator is doubly robust because the estimator will be consistent, provided that at least one of the two components is correctly specified.

Appendix A.2. The Doubly Robust Estimation for Causal Forests

Athey and Wager (2019) and their grf R package implement a variant of doubly robust AIPW estimators for causal forests. Specifically, for estimating average treatment effect, their doubly robust estimator is shown as follows.
ATE ^ = γ ^ = 1 n i = 1 n Γ ^ i , Γ ^ i = γ ^ ( i ) ( X i ) + D i e ^ ( i ) ( X i ) e ^ ( i ) ( X i ) 1 e ^ ( i ) ( X i ) Y i m ^ ( i ) ( X i ) D i e ^ ( i ) ( X i ) γ ^ ( i ) ( X i ) ,
where γ ^ ( X i ) is the conditional average treatment effect estimator based on causal forest, Γ ^ i is the conditional average treatment effect estimator adjusted by inverse probability weighting, m ^ ( x ) and e ^ ( x ) are the estimators of E [ Y | X = x ] and E [ D | X = x ] which are based on random forest with honest splitting, and the average treatment effect estimator γ ^ is simply the sample average of those adjusted conditional average treatment effect estimates.
Glynn and Quinn (2009) provide some evidence that the doubly robust estimator performs better in terms of efficiency than inverse probability weighting estimators, matching estimators, and regression estimators. To explore how adapting the doubly robust method in the causal forest estimator affects the efficiency and accuracy, we follow their DGP designs and conduct Monte Carlo experiments with different degree of confoundedness. In the simulation, X 1 , X 2 , and X 3 are covariates following N ( 0 , 1 ) , D is the treatment variable, Y is the outcome variable, and ϵ is the disturbance which follows N ( 0 , 1 ) . Two data generating processes are considered. Degree of confoundedness are modeled in three levels: low, moderate, and severe.
Table A1. Simulation setting.
Table A1. Simulation setting.
Outcome (control)Outcome (treatment)
Simple DGP Y = X 2 + X 3 + ϵ Y = 5 + 3 X 2 + X 3 + ϵ
Complicate DGP Y = X 2 + X 3 + ϵ Y = 5 + 3 X 2 + X 3 + 2 X 2 2 + 2 X 3 2 + ϵ
Degree of confoundednessTrue treatment assignment probabilities
Low P ( D = 1 | X ) = Φ ( 0.1 X 1 + 0.1 X 2 + 0.05 X 1 X 2 )
Moderate P ( D = 1 | X ) = Φ ( X 1 + X 2 + 0.5 X 1 X 2 )
Severe P ( D = 1 | X ) = Φ ( 1.5 X 1 + 1.5 X 2 + 0.75 X 1 X 2 )
With three different sample sizes, 250 , 500 , and 1000, three degrees of confoundedness, and two DGP settings, the Monte Carlo results are tabulated in Table A2. The results confirm that the causal forest with doubly robust estimation indeed has efficiency gains over the conventional causal forest.
Table A2. Finite-sample performance: causal forests with doubly robust estimation.
Table A2. Finite-sample performance: causal forests with doubly robust estimation.
Linear DGPNonlinear DGP
Causal forestCausal forest with
doubly robust
Causal forestCausal forest with
doubly robust
Sample sizeConfoundedness degreeRMSERMSERMSERMSE
250low0.37300.16930.75420.3147
250moderate0.42000.20990.92950.3914
250severe0.45620.22181.02050.3997
500low0.32060.10810.69110.1855
500moderate0.36340.14170.87110.2320
500severe0.41070.14970.95290.2505
1000low0.27450.07170.60410.1124
1000moderate0.32440.10080.77550.1540
1000severe0.37420.10980.89190.1709

Appendix A.3. The Doubly Robust Estimation for Instrumental Causal Forests

With instrumental variables, Athey and Wager (2018) provide a doubly robust estimator for local average treatment effect; namely
LATE ^ = γ ^ = 1 n i = 1 n Γ ^ i , Γ ^ i = γ ^ ( i ) ( X i ) + 1 Δ ^ ( i ) ( X i ) Z i z ^ ( i ) ( X i ) z ^ ( i ) ( X i ) 1 z ^ ( i ) ( X i ) Y i m ^ ( i ) ( X i ) D i e ^ ( i ) ( X i ) γ ^ ( i ) ( X i ) ,
where γ ^ ( X i ) is the conditional local average treatment effect estimator based on instrumental forest, Γ ^ i is the conditional local average treatment effect estimator adjusted by inverse probability weighting, m ^ ( x ) , e ^ ( x ) , z ^ ( x ) , and Δ ^ ( x ) are the estimators of E [ Y | X = x ] , E [ D | X = x ] , E [ Z | X = x ] , and P ( D | Z = 1 , X = x ) P ( D | Z = 0 , X = x ) which are based on random forest with honest splitting, and the local average treatment effect estimator γ ^ is simply the sample average of those adjusted conditional local average treatment effect estimates.

Appendix A.4. An Unsolved Task: The Doubly-Robust GRF-IVQR

Researchers would like to incorporate the doubly robust estimation in the GRF-IVQR model, following similar ideas introduced above. However, it remains unclear how to do it. We leave this unsolved task as future work.

Appendix A.5. Identifying Restrictions and Regularity Conditions for the GRF-IVQR

Following Chernozhukov and Hansen (2008), we consider the instrumental variable quantile regression characterizing the structural relationship:
Y = D θ ( U ) + X ν ( U ) , U | X , Z Uniform ( 0 , 1 ) D = δ ( X , Z , V ) where V is statistically dependent on U τ D θ ( τ ) + X ν ( τ ) strictly increasing in τ
where
  • Y is the scalar outcome variable of interest.
  • U is a scalar random variable (rank variable) that aggregates all of the unobserved factors affecting the structural outcome equation.
  • D is a vector of endogenous variables determined by δ ( X , Z , V ) .
  • V is a vector of unobserved disturbances determining D and correlated with U.
  • Z is a vector of instrumental variables.
  • X is a vector of included control variables.
The one-dimensional rank variable and the rank similarity (rank preservation) condition imposed on the outcome equation play an important role in identifying the quantile treatment effect. To derive the standard error of the IVQR estimator, the following assumptions are needed as well.
Assumption CH1.
Y i , D i , X i , Z i are iid defined on the probability space Ω , F , P and have compact support.
Assumption CH2.
For the given τ, ( θ ( τ ) , ν ( τ ) ) is in the interior of the parameter space.
Assumption CH3.
Density f Y ( Y | X , D , Z ) is bounded by a constant f ¯ a.s.
Assumption CH4.
E [ 1 ( Y < D θ + X ν + Z γ ) Ψ ] / ( ν , γ ) at ( ν , γ ) = ( ν ( θ , τ ) , γ ( θ , τ ) ) has full rank for each θ in Θ, for Ψ = V i ( Z i , X i ) .
Assumption CH5.
E [ 1 ( Y < D θ + X ν ) Ψ ] / ( θ , ν ) has full rank at ( θ ( τ ) , ν ( τ ) ) .
Assumption CH6.
The function ( θ , ν ) E [ { τ 1 ( Y < D θ + X ν ) Ψ } ] is one-to-one over parameter space.
Assumptions CH1–CH6 are compatible with those imposed in Athey et al. (2019); for example, both sets of assumptions do not apply to time-series data.
Assumption ATW1
(Lipschitz x-signal). For fixed values of ( θ , ν ) , we assume that M θ , ν ( x ) : = E [ ψ θ , ν ( O ) | X = x ] is Lipschitz continuous in x.
Assumption ATW2
(Smooth identification). When x is fixed, we assume that the M-function is twice continuously differentiable in ( θ , ν ) with a uniformly bounded second derivative, and that V ( x ) : = V θ ( x ) , ν ( x ) ( x ) is invertible for all x X , with V θ , ν : = ( θ , ν ) M θ , ν ( x ) | θ ( x ) , ν ( x ) .
Assumption ATW3
(Lipschitz ( θ , ν ) -variogram). The score functions ψ θ , ν ( O i ) have a continuous covariance structure. Writing γ for the worst-case variogram and · F for the Frobenius norm, then for some L > 0 ,
γ θ ν , θ ν L θ ν θ ν 2 for all ( θ , ν ) , ( θ , ν ) γ θ ν , θ ν : = sup x X V a r [ ψ θ , ν ( O i ) ψ θ , ν ( O i ) | X i = x ] F
Assumption ATW4
(Regularity of ψ ). The ψ-functions can be written as ψ θ , ν ( O ) = λ ( θ , ν ; O i ) + ζ θ , ν ( g ( O i ) ) , such that λ is Lipschitz-continuous in ( θ , ν ) , g : O i R is a univariate summary of O i , and ζ θ , ν : R R is any family of monotone and bounded functions.
Assumption ATW5
(Existence of solutions). We assume that, for any weights α i with α i = 1 , the estimating equation ( θ ^ ( x ) , ν ^ ( x ) ) argmin θ , ν i = 1 n α i ( x ) ψ θ , ν ( O i ) 2 returns a minimizer ( θ ^ , ν ^ ) taht at least approximately solves the estimating equation i = 1 n α i ψ θ ^ , ν ^ ( O i ) 2 C max { α i } , for some constant C 0 .
Assumption ATW6
(Convexity). The score function ψ θ , ν ( O i ) is a negative sub- gradient of a convex function, and the expected score M θ , ν ( X i ) is the negative gradient of a strongly convex function.
Given Assumptions ATW1-ATW6, the Theorems 3 and 5 of Athey et al. (2019) guarantee that the GRF estimator achieves consistency and asymptotic normality. In what follows, we check each assumptions for the proposed GRF-IVQR estimator.
Observe that the score function of the IVQR
ψ θ , ν ( O i ) = τ 1 ( Y i D i θ ( τ , x ) + X i ν ( τ , x ) ) ( Z i , X i ) .
In Chernozhukov and Hansen (2008), the moment functions are conditional on { X i , D i , Z i } . For simplicity, we write conditional functions as [ · | X i = x ] when considering splitting in X i within the framework of generalized random forests.
Checking Assumption ATW1.
E ψ θ ( τ , x ) , ν ( τ , x ) ( O i ) | X i = x = E τ 1 ( Y i D i θ ( τ , x ) + X i ν ( τ , x ) ) ( Z i , X i ) | X i = x for all x X .
Thus the expected score function
M θ , ν ( x ) = E ψ θ , ν ( O i ) | X i = x = E τ 1 ( Y i D i θ + X i ν ) ( Z i , X i ) | X i = x = τ F ( Y i D i θ + X i ν | X i = x ) ( Z i , x ) .
We want the conditional cumulative distribution function is Lipschitz continuous. Since every function with bounded first derivatives is Lipschitz, we need the conditional density is bounded. Assumption CH3 states that the conditional density f Y ( Y | X , D , Z ) is bounded by a constant f ¯ a.s.. In particular, f Y ( Y | X , D , Z ) is a density of a convolution of a continuous random variable and a discrete random variable, we also need the continuous variable not to be degenerate.
Checking Assumption ATW2.
M θ , ν ( x ) = τ F ( D i θ + X i ν | X i = x ) ( Z i , x ) . V θ , ν ( x ) = ( θ , ν ) M θ , ν ( x ) | θ ( τ , x ) , ν ( τ , x ) = ( θ , ν ) τ F ( D i θ + X i ν | X i = x ) ( Z i , x ) = f ( D i θ + X i ν | X i = x ) Z i D i f ( D i θ + X i ν | X i = x ) x D i f ( D i θ + X i ν | X i = x ) Z i x f ( D i θ + X i ν | X i = x ) x x .
We want V is invertible and therefore Z i D i x D i Z i x x x needs to be invertible. In addition, the conditional density f ( D i θ + X i ν | X i = x ) is required to have continuous uniformly bounded first derivative. If f ( D i θ + X i ν | X i = x ) is continuously differentiable, then its first derivative is uniformly bounded. Those conditions are implied by Assumptions CH4 and CH5. Thus A p is invertible as well.
Checking Assumption ATW3.
γ θ ν , θ ν = sup x X V a r [ ψ θ , ν ( O i ) ψ θ , ν ( O i ) | X i = x ] F = sup x X { V a r τ 1 ( Y i D i θ + X i ν ) ( Z i , X i ) τ 1 ( Y i D i θ + X i ν ) ( Z i , X i ) X i = x F = sup x X ( Z i , x ) ( Z i , x ) V a r 1 ( Y i D i θ + X i ν ) + 1 ( Y i D i θ + X i ν ) X i = x F = sup x X { ( Z i , x ) ( Z i , x ) F ( D i θ + X i ν | X i = x ) F ( D i θ + X i ν | X i = x ) 1 F ( D i θ + X i ν | X i = x ) F ( D i θ + X i ν | X i = x ) F } .
Taylor expansion implies the following approximation of γ .
γ θ ν , θ ν sup x X ( Z i , x ) ( Z i , x ) f ( y | X i = x ) D i ( θ θ ) + X i ( ν ν ) F .
Since the conditional probability density function is bounded, there exists a L > 0 , such that
γ θ ν , θ ν L θ ν θ ν 2 .
Checking Assumption ATW4. The score function can be written as
ψ θ , ν ( O i ) = τ 1 ( Y i D i θ + X i ν ) ( Z i , X i ) = λ ( θ , ν ; O i ) + ζ θ , ν ( g ( O i ) ) ,
where
g ( O i ) = Y i , and ζ θ , ν ( g ( O i ) ) = τ 1 ( Y i D i θ + X i ν ) Z i τ 1 ( Y i D i θ + X i ν ) X i .
Checking Assumption ATW5. Since Assumption ATW5 is used to ensure the existence of solutions, it is required.
Checking Assumption ATW6. With a V-shaped check function of the instrumental variable quantile regression, the corresponding score function ψ θ , ν ( O i ) is a negative subgradient of a convex function, and the expected score function M θ , ν ( x ) is a negative gradient of a strongly convex function. Therefore, Assumption ATW6 holds.
Corollary.(Consistency and asymptotic normality of the GRF-IVQR estimator) Given Assumptions ATW1-6, Assumptions CH1-6, and Theorems 3 and 5 of Athey et al. (2019), the GRF-IVQR is consistent and asymptotically normal:
θ ^ n ( x ) θ ( x ) σ n ( x ) N ( 0 , 1 ) .
The variance estimator
σ ^ n 2 = ξ V ^ n ( x ) 1 H ^ n ( x ) ( V ^ n ( x ) 1 ) ξ ,
where V ^ n ( x ) and H ^ n ( x ) are consistent estimators for the V θ , ν ( x ) and H n ( x ) = V a r i = 1 n α i ψ θ , ν ( O i ) respectively.

References

  1. Abadie, Alberto, Joshua Angrist, and Guido Imbens. 2002. Instrumental variables estimates of the effect of subsidized training on the quantiles of trainee earnings. Econometrica 70: 91–117. [Google Scholar] [CrossRef] [Green Version]
  2. Athey, Susan, and Guido Imbens. 2016. Recursive partitioning for heterogeneous causal effects. Proceedings of the National Academy of Sciences 113: 7353–60. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  3. Athey, Susan, and Guido Imbens. 2019. Machine learning method that economists should know about. Annual Review of Economics 11: 685–725. [Google Scholar] [CrossRef]
  4. Athey, Susan, Julie Tibshirani, and Stefan Wager. 2019. Generalized random forests. The Annals of Statistics 47: 1148–78. [Google Scholar] [CrossRef] [Green Version]
  5. Athey, Susan, and Stefan Wager. 2018. Efficient policy learning. arXiv arXiv:1702.02896v4. [Google Scholar]
  6. Athey, Susan, and Stefan Wager. 2019. Estimating treatment effects with causal forests: An application. arXiv arXiv:1902.07409. [Google Scholar]
  7. Breiman, Leo. 2001. Random forests. Machine Learning 45: 5–32. [Google Scholar] [CrossRef] [Green Version]
  8. Chen, Jau-er, and Jia-Jyun Tien. 2019. Debiased/Double Machine Learning for Instrumental Variable Quantile Regressions. Working Paper. Taipei, Taiwan: Center for Research in Econometric Theory and Applications, National Taiwan University. [Google Scholar]
  9. Chernozhukov, Victor, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whitney Newey, and James Robins. 2018. Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal 21: C1–C68. [Google Scholar] [CrossRef]
  10. Chernozhukov, Victor, and Christian Hansen. 2004. The effects of 401(k) participation on the wealth distribution: An Instrumental quantile regression analysis. Review of Economics and Statistics 86: 735–51. [Google Scholar] [CrossRef] [Green Version]
  11. Chernozhukov, Victor, and Christian Hansen. 2005. An IV model of quantile treatment effects. Econometrica 73: 245–61. [Google Scholar] [CrossRef] [Green Version]
  12. Chernozhukov, Victor, and Christian Hansen. 2008. Instrumental variable quantile regression: A robust inference approach. Journal of Econometrics 142: 379–98. [Google Scholar] [CrossRef] [Green Version]
  13. Chernozhukov, Victor, and Christian Hansen. 2013. NBER 2013 Summer Institute: Econometric Methods for High-Dimensional Data. Available online: http://www.nber.org/econometrics_minicourse_2013/ (accessed on 15 July 2013).
  14. Chiou, Yan-Yu, Mei-Yuan Chen, and Jau-er Chen. 2018. Nonparametric regression with multiple thresholds: Estimation and inference. Journal of Econometrics 206: 472–514. [Google Scholar] [CrossRef] [Green Version]
  15. Davis, Jonathan M. V., and Sara B. Heller. 2017. Using causal forests to predict treatment heterogeneity: An application to summer jobs. American Economic Review 107: 546–50. [Google Scholar] [CrossRef]
  16. Frandsen, Brigham R., and Lars J. Lefgren. 2018. Testing rank similarity. Review of Economics and Statistics 100: 86–91. [Google Scholar] [CrossRef]
  17. Gilchrist, Duncan Sheppard, and Emily Glassberg Sands. 2016. Something to talk about: Social spillovers in movie consumption. Journal of Political Economy 124: 1339–82. [Google Scholar] [CrossRef] [Green Version]
  18. Glynn, Adam N., and Kevin M. Quinn. 2009. An introduction to the augmented inverse propensity weighted estimator. Political Analysis 18: 36–56. [Google Scholar] [CrossRef] [Green Version]
  19. Guilhem, Bascle, Louis Mulotte, and Jau-er Chen. 2019. Addressing Strategy endogeneity and performance heterogeneity: Evidence from firm multinationality. Academy of Management Proceedings 2019: 12733. [Google Scholar]
  20. O’Neill, Eoghan, and Melvyn Weeks. 2018. Causal tree estimation of heterogeneous household response to time-of-use electricity pricing schemes. arXiv arXiv:1810.09179. [Google Scholar]
  21. Robins, James M., Andrea Rotnitzky, and Lue Ping Zhao. 1994. Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association 89: 846–66. [Google Scholar] [CrossRef]
  22. Strittmatter, Anthony. 2019. Heterogeneous earnings effects of the job corps by gender: A translated quantile apporach. Labour Economics 61: 101760. [Google Scholar] [CrossRef]
  23. Wager, Stephan, and Susan Athey. 2018. Estimation and inference of heterogeneous treatment effects using random forests. Journal of the American Statistical Association 113: 1228–42. [Google Scholar] [CrossRef] [Green Version]
1.
Following the default setting of the grf package, we set the max.depth equal to 4.
2.
Since Abadie, Angrist and Imbens (2002) and Chernozhukov and Hansen (2005) impose different identification strategies, the corresponding estimated quantile treatment effect are, in general, for distinct sub-populations.
Figure 1. Variable importance across quantiles.
Figure 1. Variable importance across quantiles.
Econometrics 07 00049 g001
Figure 2. Top 4 variable importance (male).
Figure 2. Top 4 variable importance (male).
Econometrics 07 00049 g002
Figure 3. Top 4 variable importance (female).
Figure 3. Top 4 variable importance (female).
Econometrics 07 00049 g003
Table 1. Quantile treatment effects of the 401(k) participation on wealth.
Table 1. Quantile treatment effects of the 401(k) participation on wealth.
Quantile
0.100.250.500.750.90
CH3209.2093566.5675523.5249134.63514768.270
(438.523)(525.499)(613.129)(1004.546)(2971.518)
GRF-IVQR3117.6743251.7945547.82210377.53015410.360
(602.872)(653.277)(735.644)(892.624)(2078.207)
Note: GRF-IV: 11090.305 (1441.989). The GRF-IV stands for the 2SLS in the context of generalized random forests. CH and GRF-IVQR stand for, respectively, Chernozhukov and Hansen (2004) and our estimator. Numbers in parentheses are standard errors.
Table 2. Variable importance.
Table 2. Variable importance.
GRF-IVGRF-IVQR at Specific Quantiles
0.100.250.500.750.90
Age0.156070.176040.106660.194010.332020.48203
Income0.644260.743480.837840.765960.621510.42814
Education0.100050.039840.017900.011310.013100.04715
Family size0.029080.026140.016380.010990.012440.02952
Married0.005770.002880.001660.003170.002670.00348
Two-earner0.014470.003490.008130.006190.007730.00352
Defined benefit pension0.021100.000600.000350.000110.000420.00048
Participation in IRA0.020320.003460.006550.002920.002780.00115
Home owner0.008900.004080.004530.005350.007330.00453
Note: The GRF-IV stands for the 2SLS in the context of generalized random forests.
Table 3. Effects of JTPA enrollment on earning (male).
Table 3. Effects of JTPA enrollment on earning (male).
Quantile
0.150.250.500.750.85
AAI121.000702.0001544.0003131.0003378.000
(475.000)(670.000)(1073.000)(1376.000)(1811.000)
CH−151.151528.529312.3122697.6983190.190
(535.146)(627.293)(957.707)(1547.084)(1536.335)
GRF-IVQR−199.114232.0991068.0862630.9692955.952
(540.548)(651.584)(950.880)(1571.200)(1645.931)
Note: GRF-IV: 1814.755 (1022.473). The GRF-IV stands for the 2SLS in the context of generalized random forests. AAI, CH and GRF-IVQR stand for, respectively, Abadie, Angrist and Imbens (2002), Chernozhukov and Hansen (2005) and our estimator. Numbers in parentheses are standard errors.
Table 4. Effects of JTPA enrollment on earning (female).
Table 4. Effects of JTPA enrollment on earning (female).
Quantile
0.150.250.500.750.85
AAI324.000680.0001742.0001984.0001900.000
(175.000)(282.000)(645.000)(945.000)(997.000)
CH35.536398.3981566.5672493.4931845.345
(266.445)(313.555)(626.065)(910.474)(1059.988)
GRF-IVQR185.141571.8421892.9342431.7931716.304
(270.490)(336.180)(610.466)(894.658)(1119.506)
Note: GRF-IV: 2127.544 (607.943). The GRF-IV stands for the 2SLS in the context of generalized random forests. AAI, CH and GRF-IVQR stand for, respectively, Abadie, Angrist and Imbens (2002), Chernozhukov and Hansen (2005) and our estimator. Numbers in parentheses are standard errors.
Table 5. Variable importance (male).
Table 5. Variable importance (male).
GRF-IVGRF-IVQR at Specific Quantiles
0.150.250.500.750.85
High school or GED0.117100.100090.101550.088820.077410.08376
Black0.049140.068830.047290.095940.099090.10482
Hispanic0.051770.000000.000000.000000.000000.00000
Married0.126560.116790.128410.096690.070700.08854
Work less than 13 week in past year0.090760.196810.165940.194910.077490.08512
Classroom training0.050130.029390.029390.039670.086690.04849
On-the-job training and/or job search assistance0.082620.244530.364000.270750.415780.37083
Age from 22 to 250.037100.047020.039300.049700.036460.04310
Age from 26 to 290.067690.022040.022040.024980.027920.06612
Age from 30 to 350.044650.032330.038200.041140.036730.03967
Age from 36 to 440.079980.027920.019100.035270.030860.02351
Age from 45 to 540.097370.000000.000000.000000.000000.00000
Whether data are from second follow-up survey0.105140.114250.044760.062140.040870.04604
Note: The GRF-IV stands for the 2SLS in the context of generalized random forests.
Table 6. Variable importance (female).
Table 6. Variable importance (female).
GRF-IVGRF-IVQR at Specific Quantiles
0.150.250.500.750.85
High school or GED0.063850.027200.033600.057600.043200.11040
Black0.051280.030400.072000.147200.120000.08800
Hispanic0.070330.000000.000000.000000.000000.00000
Married0.099090.035200.011200.080000.068800.18240
ADFC0.147440.026580.028800.080700.078810.10153
Work less than 13 week in past year0.050330.077170.113600.080700.083010.07367
Classroom training0.152840.177350.241600.348760.192370.17197
On-the-job training and/or job search assistance0.069050.560490.443200.168230.351400.16323
Age from 22 to 250.039480.001600.004800.000000.001600.00000
Age from 26 to 290.053550.000000.000000.000000.000000.00000
Age from 30 to 350.052570.030400.035200.012800.016000.04000
Age from 36 to 440.053250.000000.000000.000000.000000.00000
Age from 45 to 540.041290.000000.000000.000000.000000.00000
Whether data are from second follow-up survey0.055640.033600.016000.024000.044800.06880
Note: The GRF-IV stands for the 2SLS in the context of generalized random forests.

Share and Cite

MDPI and ACS Style

Chen, J.-e.; Hsiang, C.-W. Causal Random Forests Model Using Instrumental Variable Quantile Regression. Econometrics 2019, 7, 49. https://doi.org/10.3390/econometrics7040049

AMA Style

Chen J-e, Hsiang C-W. Causal Random Forests Model Using Instrumental Variable Quantile Regression. Econometrics. 2019; 7(4):49. https://doi.org/10.3390/econometrics7040049

Chicago/Turabian Style

Chen, Jau-er, and Chen-Wei Hsiang. 2019. "Causal Random Forests Model Using Instrumental Variable Quantile Regression" Econometrics 7, no. 4: 49. https://doi.org/10.3390/econometrics7040049

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop