Next Article in Journal
Quantile Regression with Generated Regressors
Previous Article in Journal
Estimating Endogenous Treatment Effects Using Latent Factor Models with and without Instrumental Variables
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Debiased/Double Machine Learning for Instrumental Variable Quantile Regressions

1
Institute for International Strategy, Tokyo International University, 1-13-1 Matobakita Kawagoe, Saitama 350-1197, Japan
2
Center for Research in Econometric Theory and Applications, National Taiwan University, No. 1, Section 4, Roosevelt Road, Taipei 10617, Taiwan
3
The Office of the Chief Economist, Microsoft Research, Redmond, WA 98052, USA
4
Department of Economics, National Taiwan University, No. 1, Section 4, Roosevelt Road, Taipei 10617, Taiwan
*
Author to whom correspondence should be addressed.
Econometrics 2021, 9(2), 15; https://doi.org/10.3390/econometrics9020015
Submission received: 29 December 2020 / Revised: 14 March 2021 / Accepted: 1 April 2021 / Published: 2 April 2021

Abstract

:
In this study, we investigate the estimation and inference on a low-dimensional causal parameter in the presence of high-dimensional controls in an instrumental variable quantile regression. Our proposed econometric procedure builds on the Neyman-type orthogonal moment conditions of a previous study (Chernozhukov et al. 2018) and is thus relatively insensitive to the estimation of the nuisance parameters. The Monte Carlo experiments show that the estimator copes well with high-dimensional controls. We also apply the procedure to empirically reinvestigate the quantile treatment effect of 401(k) participation on accumulated wealth.

1. Introduction

Machine learning methods have been actively studied in economic big data settings in recent years, cf. Athey (2017) and Athey and Imbens (2019). Most empirical studies in economics aim to understand the program evaluation, or equivalently, the causal effect. Constructing the counterfactual and then estimating causal effects relies on an appropriately chosen identification strategy. In economics, the instrumental variable approach is an extensively used identification strategy for causal inference. Therefore, the machine learning techniques often require an adaptation to exploit the structure of the underlying identification strategy. These adaptations are part of an emerging research area at the intersection of machine learning and econometrics, which is called the causal machine learning in the economic literature. Two popular causal machine learning approaches are currently available to estimate treatment effects through adapted machine learning algorithms, and they also provide valid standard errors of an estimated causal parameter of interest, such as the average treatment effect and the quantile treatment effect. These two approaches are the double machine learning (DML) cf. Chernozhukov et al. (2018), and the generalized random forests (GRF) of Athey et al. (2019). The GRF estimates heterogeneous treatment effects and explores variable importance accounting for heterogeneity in the treatment effect. The resulting information is crucial for optimal polices mapping from individuals’ observed characteristics to treatments. The DML provides a clever and general recipe for use of sample splitting, cross-fitting, and Neyman orthogonalization, to make causal inference possible and allows for almost any machine learner. Furthermore, the DML is feasible for dealing with high-dimensional datasets where researchers observe massive characteristics of the units. For instance, through sample splitting, the DML estimates each of the nuisance functions (e.g., the expectations for the target variable and outcome variable given high-dimensional controls) on an auxiliary sample, and then it uses out-of-sample residuals as the basis for the treatment effect estimation. Moreover, the cross-fitting algorithm allows researchers to use all of the data in the final treatment effect estimation, instead of throwing an auxiliary sample used in the sample splitting earlier. This procedure in fact follows the Neyman-type orthogonal moment conditions which ensure that the estimation above is insensitive to the first-order perturbations of the nuisance parameter near the true value, and consequently the regular inference on a low-dimensional causal parameter proceeds.
With the identification strategy of selection on observables (aka. unconfoundedness), empirical applications have been investigated by using the aforementioned two approaches, including the works by Gilchrist and Sands (2016) and Davis and Heller (2017). When it comes to the identification strategy of selection on unobservables, few empirical papers that use causal machine learning can be found in the existing literature. Those empirical applications very often lack important observed control variables or involve reverse causality, and thus, researchers resort to the instrumental variable approach. In this study, we investigate the estimation and inference of a low-dimensional causal parameter in the presence of high-dimensional controls in an instrumental variable quantile regression. In particular, we build on a previous study Chernozhukov et al. (2018) and then further concretize the econometric procedure. To the best of our knowledge, this study is the first to investigate Monte Carlo performance and empirical studies based on the DML procedure within the framework of instrumental variable quantile regressions. We also make our R codes available on the GitHub repository1 so that other researchers can benefit from the proposed estimation method.
Chen and Hsiang (2019) investigated the instrumental variable quantile regression in the context of GRF. Their econometric procedure yielded a measure of variable importance in terms of characterizing heterogeneity in the treatment effect. They proceeded by empirically investigating the distributional effect of 401(k) participation on net financial assets. They demonstrated that income, age, education, and family size are the first four important variables in explaining treatment effect heterogeneity. In contrast to our study, their GRF-based estimator is not designed for high-dimensional settings. With the same dataset, we also apply the proposed procedure to empirically investigate the distributional effects of the 401(k) participation on net financial assets. Empirical results signify that the 401(k) participants with low savings propensity are more associated with the nonlinear income effect, which complements the findings in studies conducted by Chernozhukov et al. (2018) and Chiou et al. (2018). In addition, nonlinear transformations of the four aforementioned variables are also identified as important variables in the current context of DML-based instrumental variable quantile regression with high-dimensional observed characteristics.
The rest of the paper is organized as follows. The model specification and practical algorithm are introduced in Section 2, which includes detailed descriptions of a general recipe for the DML. Section 3 presents finite-sample performances of the estimator through Monte Carlo experiments. Section 4 reinvestigates an empirical study on quantile treatment effects: The effect of 401(k) participation on wealth. Section 5 concludes the paper.

2. The Model and Algorithm

In this study, we use the instrumental variable quantile regression (IVQR) of Chernozhukov and Hansen (2005) and Chernozhukov and Hansen (2008) to identify the quantile treatment effect. In Section 2.1, we briefly review the DML procedure developed in Chernozhukov et al. (2018). In Section 2.2, we briefly review the conventional IVQR based on the exposition in Chernozhukov and Hansen (2005). In Section 2.3, we present DML-IVQR within the framework of high-dimensional controls.

2.1. The Double Machine Learning

In this section, we briefly review the plain DML procedure. Let us consider the following canonical example of estimating treatment effect α 0 in a partial linear regression under the identification strategy of selection on observables.
Y = D α 0 + h 0 ( X ) + U , E [ U | X , D ] = 0
where Y is the outcome variable, D is the target variable, and X is a high-dimensional vector of controls. X are control variables in the sense that
D = m 0 ( X ) + V ,
where m 0 ( · ) 0 and E [ V | X ] = 0 . Note that h 0 ( X ) and m 0 ( X ) are nuisance functions because they are not the primary objects of interest. Chernozhukov et al. (2018) develop the DML procedure for estimating α 0 , which is outlined in the following three steps.
I.
[Sample splitting] Split the data into K random and roughly equally sized folds. For k = 1 , , K , a machine learner is used to fit the high-dimensional nuisance functions, E ^ ( k ) [ Y | X ] and E ^ ( k ) [ D | X ] , using all data except for the kth fold.
II.
[Cross-fitting and residualizing] Calculate out-of-sample residuals for these fitted nuisance functions on the kth fold; that is, Y ^ ( k ) = Y ( k ) E ^ ( k ) [ Y | X ] and D ^ ( k ) = D ( k ) E ^ ( k ) [ D | X ] .
III.
[Treatment effect estimation and inference] Collect all of the out-of-sample residuals from the cross-fitting stage, and use the ordinary least squares to regress Y ^ on D ^ to obtain α ˇ , the estimator of α 0 . The resulting α ˇ estimate can be paired with heteroskedastic consistent standard errors to obtain a confidence interval for the treatment effect.
Because estimating the nuisance functions through machine learners induces regularization biases, the cross-fitting step was used to refrain from its biasing the treatment effect estimate. The procedure is random due to the sample splitting. Different researchers with the same data set but making different random splits will obtain distinct estimators. This randomness can be reduced by using a larger value of K, but this increases computation cost. K 10 is recommended. In fact, the DML procedure follows a unified approach in terms of moment conditions and the Neyman orthogonality condition, cf. Chernozhukov et al. (2015). In a nutshell, we seek to find moment conditions
E g Y , D , X , α 0 , η 0 = 0
such that the following Neyman orthogonality condition holds
η E g Y , D , X , α 0 , η 0 η = η 0 = 0 ,
where η 0 are nuisance functions with the true values. Equation (4) is insensitive to the first-order perturbations of the nuisance function η near the true value. This property allows the estimation of η 0 using regularized estimators (machine learners) η ^ . Without this property, regularization may have too much effect on the estimator of α 0 for regular inference to proceed. The estimator α ˇ of α 0 solves the empirical analog of the Equation (3):
1 n i = 1 n g y i , d i , x i , α ˇ , η ^ = 0 ,
where we have plugged in the estimator η ^ for the nuisance function. Owing to the Neyman orthogonality property, the estimator is first-order equivalent to the infeasible estimator α ˜ solving
1 n i = 1 n g y i , d i , x i , α ˜ , η 0 = 0 ,
where we use the true value of η .
Therefore, we recast the canonical example set by Equations (1) and (2) into the moment conditions that guide the DML procedure outlined above.
E g Y , D , X , α 0 , η 0 = E [ Y E [ Y | X ] D E [ D | X ] α 0 × D E [ D | X ] ] = E [ D α 0 + h 0 ( X ) + U m 0 ( X ) α 0 + h 0 ( X ) V α 0 × V ] = E D α 0 m 0 ( X ) α 0 + U V α 0 × V = E m 0 ( X ) α 0 V + V 2 α 0 m 0 ( X ) α 0 + U V V 2 α 0 = E U V = 0 ,
where η 0 = E [ Y | X ] E [ D | X ] . It is easy to see that the corresponding Neyman orthogonality condition holds
η E g Y , D , X , α 0 , η 0 η = E [ Y | X ] E [ D | X ] = η E [ Y E [ Y | X ] D E [ D | X ] α 0 × D E [ D | X ] ] η = E [ Y | X ] E [ D | X ] = 0 .

2.2. The Instrumental Variable Quantile Regression

Based on the exposition in Chernozhukov and Hansen (2005), the following conditional moment restriction yields an IVQR estimator:
P [ Y q ( τ , D , X ) | X , Z ] = τ ,
where q ( · ) is the structural quantile function, τ is the quantile index, Y is the outcome variable, D is the target (endogenous) variable, and X and Z are control variables and instruments, respectively. Equation (5) and linear structural quantile specification lead to the following unconditional moment restriction
E [ ( τ 1 ( Y D α X β 0 ) Ψ ] = 0
where
Ψ : = Ψ ( X , Z )
is a vector of the function of the instruments and control variables, and ( α , β ) are the unknown parameters. In particular, α is a causal parameter of interest. The parameters depend on the quantile of interest, but we suppress τ associated with α and β for simplicity of presentation. Equation (6) leads to a particular moment condition for residualization. That is
g τ ( α ; β , δ ) = τ 1 ( Y D α + X β ) Ψ ( α , δ ( α ) )
with the instrument
Ψ ( α , δ ( α ) ) : = ( Z δ ( α ) X )
δ ( α ) = M ( α ) J 1 ( α ) ,
where δ is a matrix parameter for weighting the least square Z on the X coefficient,
M ( α ) = E [ Z X f ε ( 0 | X , Z ) ] , J ( α ) = E [ X X f ε ( 0 | X , Z ) ]
and f ε ( 0 | X , Z ) is the conditional density of ϵ = Y D α X β ( α ) with β ( α ) defined by
E [ ( τ 1 ( Y D α + X β ( α ) ) X ] = 0 .
First, we construct the grid search interval for α and then profile out the coefficient for each α in the interval on the exogenous variable using Equation (9). Specifically,
β ^ ( a ) = arg min b B 1 N i = 1 N ρ τ ( Y i D i a X i b ) .
By substituting these estimates into the sample counterpart of the moment restriction, we obtain
g ^ N ( a ) = 1 N i = 1 N g ( a , β ^ ( a ) , δ ^ ( a ) ) ,
where
δ ^ ( a ) = M ^ ( a ) J ^ 1 ( a )
with
M ^ ( a ) = 1 N h N i = 1 N Z i X i K h N Y i D i a X i β ^ ( a )
J ^ ( a ) = 1 N h N i = 1 N X i X i K h N Y i D i a X i β ^ ( a )
where K h N is a kernel function with bandwidth h N . In the Monte Carlo simulations, we assume that we know the density function according to our data generation process. Thus, we can solve for the parameters by optimizing the criterion function of generalized method of moments (GMM) as follows:
α ^ ( τ ) = arg min a A N g ^ N ( a ) Σ ^ ( a , a ) 1 g ^ N ( a ) ,
where
Σ ^ ( a 1 , a 2 ) = 1 N i = 1 N g a 1 , β ^ ( a 1 ) g a 2 , β ^ ( a 2 )
is a weighting matrix used in the GMM estimation. Note that the estimator α ^ based on the inverse quantile regression (i.e., IVQR) of Chernozhukov and Hansen (2008) is the first-order equivalent to the estimator defined by the GMM above.

2.3. Estimation with High-Dimensional Controls

We modify the procedure presented in Section 2.2 to deal with a dataset of high-dimensional control variables. To this end, we construct the grid search interval for α and profile the coefficients on exogenous variables using the L 1 -norm penalized quantile regression estimator of Belloni and Chernozhukov (2011):
β ^ ( a ) = arg min b B 1 N i = 1 N ρ τ ( Y i D i a X i b ) + λ j = 1 d i m ( b ) σ ^ j | b j | ,
where ρ ( · ) is the check function and σ ^ j 2 = ( 1 / n ) i = 1 n x i j 2 . The penalty level λ is chosen as follows.
λ = 2 · Λ ( 1 α | X ) ,
where Λ ( 1 α | X ) : = ( 1 α ) -quantile of Λ conditional on X. The random variable
Λ = n sup u U max 1 j d i m ( b ) 1 n i = 1 n x i j ( u I { u i u } ) σ ^ j u ( 1 u ) ,
where u 1 , , u n are i.i.d. uniform (0, 1) random variables that are independently distributed from the controls x 1 , , x n . The random variable Λ has a pivotal distribution conditional on X = [ x 1 , , x n ] . Therefore, we compute Λ ( 1 α | X ) using simulation of Λ . Belloni and Chernozhukov (2011) show that the aforementioned choice for the penalty level λ leads to the optimal rates of convergence for the L 1 -norm penalized quantile regression estimator. Namely, the choice of the penalization parameter λ based on (13) is theoretically grounded and feasible. In high-dimensions setting, K-fold cross-validation is very popular in practice. However, computational cost is roughly proportional to K. The recently derived non-asymptotic error bounds in Chetverikov et al. (2021) imply that the K-fold cross-validated Lasso estimator has nearly optimal convergence rates. While their theoretical guarantees do not directly apply to the L 1 -norm penalized quantile regression estimator, it still sheds some light on the use of cross-validation as an alternative to determine the penalty level λ in our analysis.2
In addition, we estimate
M ^ ( a ) = 1 N h N i = 1 N Z i X i K h N Y i D i a X i β ^ ( a )
and
J ^ ( a ) = 1 N h N i = 1 N X i X i K h N Y i D i a X i β ^ ( a ) .
We also perform dimension reduction on J because of the large dimension of X. In particular, we implement the following regularization.
δ ^ j ( a ) = arg min δ 1 2 δ J ^ ( a ) δ M ^ j ( a ) δ + ϑ | | δ | | 1 .
The regularization above does a weighting Lasso for each instrument variable on control variables. Consequently, the L 1 norm optimization obeys the Karush–Kuhn–Tucker condition
| | δ ^ j ( a ) J ^ ( a ) M ^ j ( a ) | | ϑ , j .
More importantly, the aforementioned procedure is the double machine learning algorithm for the IVQR, which satisfies the Neyman orthogonality condition as follows. Let us present the IVQR as a first-order-equivalent GMM estimator. To this end, we define
g ( α , η ) = τ 1 ( Y D α + X β ) Z δ ( α ) X
where η = [ β ( α ) δ ( α ) ] are high-dimensional nuisance parameters in the DML setting discussed in Section 2.1 with true values η 0 = [ β ( α 0 ) δ ( α 0 ) ] . Therefore,
E g ( α 0 , η 0 ) = E τ 1 ( Y D α 0 + X β 0 ) Z δ ( α 0 ) X = E E τ 1 ( Y D α 0 + X β 0 ) X , Z Z δ ( α 0 ) X = 0 .
We then calculate
η E g ( α 0 , η ) η = η 0 = β E g ( α 0 , η ) η = η 0 δ E g ( α 0 , η ) η = η 0 .
Specifically,
β E g ( α 0 , η ) η = η 0 = β E E τ 1 ( Y D α 0 + X β 0 ) X , Z Z δ ( α 0 ) X = β E τ F ( Y D α 0 + X β 0 X , Z ) Z δ ( α 0 ) X = E Z X f ϵ ( 0 | X , Z ) δ ( α 0 ) E X X f ϵ ( 0 | X , Z ) = M ( α 0 ) δ ( α 0 ) J ( α 0 ) = M ( α 0 ) M ( α 0 ) J 1 ( α 0 ) J ( α 0 ) = 0 .
δ E g ( α 0 , η ) η = η 0 = δ E τ 1 ( Y D α 0 + X β 0 ) Z δ ( α 0 ) X = E τ 1 ( Y D α 0 + X β 0 ) X = 0 .
We thus verify that η E g ( α 0 , η ) η = η 0 = 0 , which indicates the Neyman orthogonality condition holds.
After implementing the DML outlined above, we solve for the low-dimensional causal parameter α by optimizing the GMM defined as follows. The sample counterpart of the moment condition
g ^ N ( a ) = 1 N i = 1 N τ 1 Y i D i a X i β ^ ( a ) 0 Ψ ( a , δ ^ ( a ) ) .
Accordingly,
α ^ = arg min a A N g ^ N ( a ) Σ ^ ( a , a ) 1 g ^ N ( a ) .
Chernozhukov et al. (2015) show that the key condition enabling us to perform valid inference on α 0 is the adaptivity condition: N g ^ ( α 0 , η ^ ) g ^ ( α 0 , η 0 ) P N 0 . In particular, each element g ^ j of g ^ = g ^ j j = 1 k can be expanded as N g ^ j ( α 0 , η ^ ) g ^ j ( α 0 , η 0 ) = T 1 , j + T 2 , j + T 3 , j , which are formally defined on page 663 in their paper. The term T 1 , j vanishes precisely because of orthogonality, that is, T 1 , j = 0 . However, the terms T 2 , j and T 3 , j do not vanish. The T 2 , j and T 3 , j vanish when cross-fitting and sample splitting are implemented. These two terms are also asymptotically negligible when we impose a further structure on the problem: such as using a sparsity-based machine learner (e.g., L 1 -norm penalized quantile regression) under approximate sparsity conditions. In our procedure, Equations (12) and (17) are sparsity-based machine learners. Therefore, we use no cross-fitting in the DML-IVQR algorithm.
Theoretically speaking, based on Equation (19), the approach can be applied to machine learners other than the Lasso. The chief difficulty in implementing an estimation based on Equation (19) is that the function being minimized is both non-smooth and non-convex, and any machine learners are used to dealing with a functional response variable in this context, cf. Belloni et al. (2017). In addition, the corresponding DML with non-linear equations is difficult. Therefore, our practical strategy is to implement the DML-IVQR procedure described in Equations (12)–(17), (21) and (22), which is equivalent to the Neyman orthogonality condition defined in (19) and (20).

2.4. Weak-Identification Robust Inference

Under the regularity conditions listed in Chernozhukov and Hansen (2008), asymptotic normality for the GMM estimator with a non-smooth objective function is guaranteed. We have
n g ^ N ( a ) d N ( 0 , Σ ( a , a ) ) .
Consequently, it leads to
N g ^ N ( a ) Σ ^ ( a , a ) 1 g ^ N ( a ) d χ d i m ( Z ) 2 .
We define
W N N g ^ N ( a ) Σ ^ ( a , a ) 1 g ^ N ( a ) .
It follows that a valid ( 1 p ) percent confidence region for the true parameter, α 0 , can be constructed as the set
C R : = { α A : W N ( α ) c 1 p } ,
where c 1 p is the critical point such that
P [ χ d i m ( Z ) 2 > c 1 p ] = p ,
and A can be numerically approximated by the grid { α j , j = 1 , , J } .

3. Monte Carlo Experiments

We evaluate the finite-sample performance, in terms of mean bias (BIAS), mean absolute error (MAE) and root-mean-square error (RMSE) of the DML-IVQR through 1000 simulations. The following data generating process is modified from that considered in Chen and Lee (2018).
u i ϵ i ~ N ( 0 , 1 0.3 0.3 1 )
x j i z 1 i z 2 i v 1 i v 2 i ~ N ( 0 , I )
Z1i = z1i + x2i+ x3i+ x4i+ v1i
Z2i = z2i + x7i+ x8i+ x9i+ x10i+ v2i
Di = Φ(z1i+z2i + εi)
Xji = Φ(xji)
Yi = 1+ Di + 5X1i + 5X2i + 5 X3i + 5X4i + 5X5i + 5X 6i + 5X7i + Di × ui,
where Φ(·) is the cumulative distribution function of a Normal random variable; i = 1, 2, …, n; j = 1, 2, …, p; p is the dimension of controls X, and p = 100. There are ten relevant controls: X1i, …, X10i. The instrumental variable is Z. The target variable is D. Consequently,
α ( τ ) = 1 F ε 1 ( τ )
where τ is the quantile index and Fε(·) is the cumulative distribution function of the random variable ε. Therefore, the median treatment effect α(0.5) = 1

3.1. Residualizing Z on X

We focus on comparing the BIAS, MAE and RMSE resulting from different procedures under the exact specification (10 control variables). res-GMM represents residualizing Z on X. GMM stands for doing no residualizing Z on X. Table 1 shows that residualizing Z on X leads to an efficiency gain across quantiles especially when the sample size is moderate.

3.2. IVQR with High-Dimensional Controls

We now evaluate the finite-sample performance of the IVQR with high-dimensional controls. The data generating process involves 100 control variables with an approximate sparsity structure. In particular, the exact model (true model) depends only on 10 relevant control variables out of the 100 controls. Let’s fix the name of different estimators first. The full-GMM uses 100 control variables without regularization. The oracle-GMM knows the identity of the true controls and then uses the ten relevant variables. The DML-IVQR is our proposed estimator. Table 2 shows that the RMSE stemmed from the DML-IVQR are close to those from the oracle estimator. The numbers in parentheses are ratios of RMSE or MAE of any estimator to those of the oracle-GMM. The BIAS and MAE indeed signify that the DML-IVQR achieves a lower bias in the simulation study. In addition, Figure 1 plots the distributions of the IVQR estimator with and without double machine learning. The DML-IVQR stands for the double machine learning for the IVQR with high-dimensional controls. Histograms signify that the DML-IVQR estimator is more efficient and less biased than IVQR using many control variables. Because a weak-identification robust inference results naturally from the IVQR, we construct the robust confidence regions for the full-GMM, oracle-GMM and the DML-IVQR estimators. In Figure 2, Figure 3 and Figure 4, the vertical axis displays the value of the test statistic W N ( α ) which is defined in Section 2.4. The horizontal line in gray is the 95% critical value from χ d i m ( Z ) 2 . Chernozhukov and Hansen (2008) robust confidence region is all values of α such that the W N ( α ) lies below the horizontal line. The robust inferential procedure is still valid when identification is weak or fails partially or completely. Thus Figure 2, Figure 3 and Figure 4 show that, across quantiles, the robust confidence region based on the DML-IVQR is relatively sharp compared to those of the full-GMM. In addition, the confidence regions based on the DML-IVQR are remarkably close to those obtained by the oracle estimator.
As to the choice of penalty parameter, researchers can chose λ based on Equation (13) proposed by Belloni and Chernozhukov (2011) or based on the K-fold cross-validation. Both methods of choosing λ lead to similar finite sample performances of DML-IVQR in terms of the RMSE, MAE and BIAS. Simulation findings are summarized in Table 3.
Sample-splitting and the application of cross-fitting are a central part of DML. Therefore, we conduct a simulation regarding the DML-IVQR and the cross-fitted DML-IVQR. Under approximate sparsity conditions and the discussion in Section 2.3, both the DML-IVQR and the cross-fitted DML-IVQR should attain valid estimates and differ slightly from each other. Table 4 does reflect the theoretical predictions as well. RMSE and MAE from the cross-fitted DML-IVQR are slightly larger because of the randomness stemmed from the 5-fold cross-fitting in the simulation.

4. An Empirical Study: Quantile Treatment Effects of 401(k) Participation on Accumulated Wealth

In this section, we reinvestigate an empirical study on quantile treatment effects: The effect of 401(k) participation on wealth, cf. Chernozhukov and Hansen (2004). Not only does this conduct data-driven robustness checks on the econometric results, but the DML-IVQR sheds light on the treatment effect heterogeneity among the control variables. This complements the existing empirical findings. In addition, we compare our empirical results with those from Chen and Hsiang (2019) that conduct the IVQR estimation by using generalized random forest approach, which is an alternative in causal machine learning literature.
Examining the effects of 401(k) plans on accumulated wealth is an issue of long-standing empirical interest. For example, based on the identification of selection on observables, Chiou et al. (2018) and Chernozhukov and Hansen (2013) suggest that the income nonlinear effect exists in the 401(k) study. Nonlinear effects from other control variables are identified as well.
Based on DML-IVQR, we reinvestigate the impact of the 401(k) participation on accumulated wealth. Total wealth (TW) or net financial assets (NFTA) is the outcome variable Y. The treatment variable D is a binary variable that stands for participation in the 401(k) plan. Instrument Z is an indicator of eligibility to enroll in the 401(k) plan. The vector of covariates X consists of income, age, family size, marriage, an IRA individual retirement account, a defined benefit status indicator, a home ownership indicator and the different education-year indicator variables. The data consists of 9915 observations.
Following the regression specification set up in Chernozhukov and Hansen (2004), Table 5 presents quantile treatment effects obtained from different estimation procedures which have been defined in the previous sections including IVQR, res-GMM and GMM. The resulting estimates are similar. As to the high-dimensional analysis, we create 119 technical control variables including those constructed by polynomial bases, interaction terms, and cubic splines (thresholds). To ensure each basis has equal length, we utilize the minimax normalization for all technical control variables. Consequently, we use the plug-in method to determine the penalty value when performing the Lasso under the moment condition, and tune the penalty in the quantile L 1 -norm objective function based on the Huber approximation by five-fold cross-validation. The DML-IVQR also implements feature normalization of the outcome variable for computational efficiency. To make the estimated treatment effects across different estimation procedures roughly comparable, Table 6 shows that the effect obtained through the DML-IVQR is multiplied by the standard deviation of the outcome variable. Weak identification/instrument robust inference on quantile treatment effects are depicted in Figure 5 and Figure 6. However, the robust confidence interval widens as the sample size decreases at the upper quantiles. Estimated quantile treatment effects are significantly different from zero. We can use the result from the DML-IVQR as a data-driven robustness check on those summarized in the Table 5.
Table 7 and Table 8 present the selected important variables across different quantiles. The approximate sparsity is asymmetric across the conditional distribution in the sense that the number of selected variables decreases as the quantile index τ increases, although it hinges on a relatively small number of observations at the upper quantiles. In this particular example, τ captures the rank variable that governs the unobservable heterogeneity: Savings propensity. Small values of τ represent participants with low savings propensity. Our empirical results thus signify that the 401(k) participants with low savings propensity are more associated with the nonlinear income effect than those with high savings propensity, which complements the results concluded in previous studies Chernozhukov et al. (2018) and Chiou et al. (2018). The nonlinear income effects, across quantiles ranging from (0, 0.5], are picked up by the selected variables, such as max ( 0 , i n c 0.2 ) , max ( 0 , i n c 2 0.2 ) , max ( 0 , i n c 3 0.2 ) and etc. Technical variables in terms of age, education, family size, and income are more frequently selected in Table 7 and Table 8. In addition, these four variables are also identified as important variables in the context of the generalized random forests, cf. Chen and Hsiang (2019).

5. Conclusions

In this study, we investigate the performance of a debiased/double machine learning algorithm within the framework of high-dimensional IVQR. The simulation results indicate that our procedure performs more efficiently than those based on conventional estimators with many controls. Furthermore, we evaluate the corresponding weak identification robust confidence interval of the low-dimensional causal parameter. Given many technical controls, we reinvestigate the quantile treatment effects of the 401(k) participation on accumulated wealth and then highlight the non-linear income effects across the savings propensity.

Author Contributions

Three authors contributed equally to the paper. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the JSPS KAKENHI (Grant No. JP20K01593), the personal research fund from Tokyo International University, and it was financially supported by the Center for Research in Econometric Theory and Applications (Grant No. 109L900203 ) from The Featured Areas Research Center Program within the framework of the Higher Education Sprout Project by the Ministry of Education in Taiwan.

Institutional Review Board Statement

Not applicable.

Acknowledgments

We are grateful to the three anonymous referees for their constructive comments that have greatly improved this paper. We thank Tsung-Chih Lai and Hsin-Yi Lin for discussions and comments. This paper has benefited from presentations at Ryukoku University and the 2nd International Conference on Econometrics and Statistics (EcoSta 2018). The usual disclaimer applies.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
DMLDouble machine learning
GMMGeneralized method of moments
GRFGeneralized random forests
IVQRInstrumental variable quantile regression
LassoLeast absolute shrinkage and selection operator

References

  1. Athey, Susan. 2017. Beyond prediction: Using big data for policy problem. Science 355: 483–85. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  2. Athey, Susan, and Guido W. Imbens. 2019. Machine learning method that economists should know about. Annual Review of Economics 11: 685–725. [Google Scholar] [CrossRef]
  3. Athey, Susan, Julie Tibshirani, and Stefan Wager. 2019. Generalized random forests. Annals of Statistics 47: 1148–78. [Google Scholar] [CrossRef] [Green Version]
  4. Belloni, Alexandre, and Victor Chernozhukov. 2011. l1-penalized quantile regression in high-dimensional sparse models. Annals of Statistics 39: 82–130. [Google Scholar] [CrossRef]
  5. Belloni, Alexandre, Victor Chernozhukov, Iván Fernández-Val, and Christian Hansen. 2017. Program evaluation and causal inference with high-dimensional data. Econometrica 85: 233–98. [Google Scholar] [CrossRef] [Green Version]
  6. Chen, Jau-er, and Chen-Wei Hsiang. 2019. Causal random forests model using instrumental variable quantile regression. Econometrics 7: 49. [Google Scholar] [CrossRef] [Green Version]
  7. Chen, Le-Yu, and Sokbae Lee. 2018. Exact computation of GMM estimators for instrumental variable quantile regression models. Journal of Applied Econometrics 33: 553–67. [Google Scholar] [CrossRef] [Green Version]
  8. Chernozhukov, Victor, and Christian Hansen. 2004. The effects of 401(k) participation on the wealth distribution: An instrumental quantile regression analysis. Review of Economics and Statistics 86: 735–51. [Google Scholar] [CrossRef] [Green Version]
  9. Chernozhukov, Victor, and Christian Hansen. 2005. An IV model of quantile treatment effects. Econometrica 73: 245–61. [Google Scholar] [CrossRef] [Green Version]
  10. Chernozhukov, Victor, and Christian Hansen. 2008. Instrumental variable quantile regression: A robust inference approach. Journal of Econometrics 142: 379–98. [Google Scholar] [CrossRef] [Green Version]
  11. Chernozhukov, Victor, and Christian Hansen. 2013. NBER 2013 Summer Institute: Econometric Methods for High-Dimensional Data. Available online: https://www.nber.org/lecture/summer-institute-2013-methods-lectures-econometric-methods-high-dimensional-data (accessed on 15 July 2013).
  12. Chernozhukov, Victor, Christian Hansen, and Martin Spindler. 2015. Valid post-selection and post-regularization inference: An elementary, general approach. Annual Review of Economics 7: 649–88. [Google Scholar] [CrossRef] [Green Version]
  13. Chernozhukov, Victor, Christian Hansen, and Kaspar Wüthrich. 2018. Instrumental variable quantile regression. In Handbook of Quantile Regression. Boca Raton: Chapman & Hall/CRC. [Google Scholar]
  14. Chernozhukov, Victor, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whitney Newey, and James Robins. 2018. Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal 21: C1–C68. [Google Scholar] [CrossRef]
  15. Chetverikov, Denis, Zhipeng Liao, and Victor Chernozhukov. 2021. On cross-validated lasso in high dimensions. Annals of Statistics. forthcoming. [Google Scholar]
  16. Chiou, Yan-Yu, Mei-Yuan Chen, and Jau-er Chen. 2018. Nonparametric regression with multiple thresholds: Estimation and inference. Journal of Econometrics 206: 472–514. [Google Scholar] [CrossRef] [Green Version]
  17. Davis, Jonathan, and Sara B. Heller. 2017. Using causal forests to predict treatment heterogeneity: An application to summer jobs. American Economic Review 107: 546–50. [Google Scholar] [CrossRef]
  18. Gilchrist, Duncan Sheppard, and Emily Glassberg Sands. 2016. Something to talk about: Social spillovers in movie consumption. Journal of Political Economy 124: 1339–82. [Google Scholar] [CrossRef] [Green Version]
  19. Yi, Congrui, and Jian Huang. 2017. Semismooth Newton coordinate descent algorithm for elastic-net penalized Huber loss regression and quantile regression. Journal of Computational and Graphical Statistics 26: 547–57. [Google Scholar] [CrossRef] [Green Version]
1
The R scripts conducting the estimation and inference of the Double Machine Learning for Instrumental Variable Quantile Regressions can be downloaded at https://github.com/FieldTien/DML-IVQR/tree/master/example (accessed on 20 March 2021).
2
We have conducted Monte Carlo experiments indicating that the choice of λ based on (13) or 5-fold cross-validation leads to similar finite sample performances of our proposed procedure in terms of root-mean-square error, mean absolute error, and bias. Simulation findings are tabulated in Section 3. When there are many binary control variables, the L 1 -norm penalized quantile regression may suffer singularity issues in estimation. If this is the case, researchers can utilize the algorithm developed by Yi and Huang (2017) using the Huber loss function to approximate the quantile loss function.
Figure 1. Histograms of the DML-IVQR Estimates (in green).
Figure 1. Histograms of the DML-IVQR Estimates (in green).
Econometrics 09 00015 g001
Figure 2. Weak-Instrument Robust Inference at 0.5 th quantile: DML-IVQR (in brown), oracle-generalized method of moments (GMM), and full-GMM.
Figure 2. Weak-Instrument Robust Inference at 0.5 th quantile: DML-IVQR (in brown), oracle-generalized method of moments (GMM), and full-GMM.
Econometrics 09 00015 g002
Figure 3. Weak-Instrument Robust Inference at 0.25 th quantile: DML-IVQR (in brown), oracle-GMM, and full-GMM.
Figure 3. Weak-Instrument Robust Inference at 0.25 th quantile: DML-IVQR (in brown), oracle-GMM, and full-GMM.
Econometrics 09 00015 g003
Figure 4. Weak-Instrument Robust Inference at 0.75 th quantile: DML-IVQR (in brown), oracle-GMM, and full-GMM.
Figure 4. Weak-Instrument Robust Inference at 0.75 th quantile: DML-IVQR (in brown), oracle-GMM, and full-GMM.
Econometrics 09 00015 g004
Figure 5. DML-IVQR Weak-Instrument Robust Inference: 401(K) participation on TW.
Figure 5. DML-IVQR Weak-Instrument Robust Inference: 401(K) participation on TW.
Econometrics 09 00015 g005
Figure 6. DML-IVQR Weak-Instrument Robust Inference: 401(K) participation on NFTA.
Figure 6. DML-IVQR Weak-Instrument Robust Inference: 401(K) participation on NFTA.
Econometrics 09 00015 g006
Table 1. Residualizing and non-residualizing Z on X.
Table 1. Residualizing and non-residualizing Z on X.
n = 500n = 1000
RMSEMAEBIASRMSEMAEBIAS
α 0.10 (res-GMM)0.18880.1510−0.08930.12190.0950−0.0551
α 0.10 (GMM)0.49630.2559−0.17750.16310.1138−0.0627
α 0.25 (res-GMM)0.12100.0966−0.03340.08120.0654−0.0256
α 0.25 (GMM)0.17820.1179−0.02540.09630.0754−0.0234
α 0.50 (res-GMM)0.09890.07160.00910.06890.0436−0.0020
α 0.50 (GMM)0.14360.10160.03400.08010.05420.0078
α 0.75 (res-GMM)0.13740.10660.05520.08280.06760.0212
α 0.75 (GMM)0.24030.17100.12940.11460.08480.0442
α 0.90 (res-GMM)0.24370.18390.12250.13910.10670.0667
α 0.90 (GMM)0.84830.53400.49590.34810.19670.1613
The date generating process considers ten control variables. res-GMM represents residualizing Z on X. The GMM does not residualize Z on X. α τ denotes the quantile treatment effect.
Table 2. Instrumental variable quantile regression (IVQR) with High-dimensional Controls.
Table 2. Instrumental variable quantile regression (IVQR) with High-dimensional Controls.
n = 500
RMSE (Ratio)MAE (Ratio)BIAS
α 0.10 (full-GMM)0.7648 (4.05)0.6645 (4.40)−0.6533
α 0.10 (oracle-GMM)0.1888 (1.00)0.1510 (1.00)−0.0893
α 0.10 (DML-IVQR)0.3112 (1.64)0.2389 (1.58)−0.2039
α 0.25 (full-GMM)0.2712 (2.24)0.2212 (2.28)−0.1876
α 0.25 (oracle-GMM)0.1210 (1.00)0.0966 (1.00)−0.0334
α 0.25 (DML-IVQR)0.1562 (1.29)0.1254 (1.29)−0.0796
α 0.50 (full-GMM)0.1627 (1.64)0.1234 (1.72)0.0190
α 0.50 (oracle-GMM)0.0989 (1.00)0.0716 (1.00)0.0091
α 0.50 (DML-IVQR)0.1168 (1.18)0.0846 (1.18)−0.0186
α 0.75 (full-GMM)0.3421 (2.48)0.2806 (2.63)0.2502
α 0.75 (oracle-GMM)0.1374 (1.00)0.1066 (1.00)0.0552
α 0.75 (DML-IVQR)0.1495 (1.08)0.1167 (1.09)0.0516
α 0.90 (full-GMM)0.9449 (3.87)0.8032 (4.36)0.7891
α 0.90 (oracle-GMM)0.2437 (1.00)0.1839 (1.00)0.1225
α 0.90 (DML-IVQR)0.3567 (1.46)0.2608 (1.41)0.2011
n= 1000
RMSE (Ratio)MAE (Ratio)BIAS
α 0.10 (full-GMM)0.3917 (3.21)0.3442 (3.62)−0.3303
α 0.10 (oracle-GMM)0.1219 (1.00)0.0950 (1.00)−0.0551
α 0.10 (DML-IVQR)0.1376 (1.12)0.1085 (1.14)−0.0759
α 0.25 (full-GMM)0.1646 (2.02)0.1361 (2.08)−0.1134
α 0.25 (oracle-GMM)0.0812 (1.00)0.0654 (1.00)−0.0256
α 0.25 (DML-IVQR)0.0991 (1.22)0.0804 (1.22)−0.0436
α 0.50 (full-GMM)0.1038 (1.50)0.0754 (1.72)−0.0002
α 0.50 (oracle-GMM)0.0689 (1.00)0.0436 (1.00)−0.0020
α 0.50 (DML-IVQR)0.0775 (1.12)0.0510 (1.16)−0.0142
α 0.75 (full-GMM)0.1747 (2.10)0.1452 (2.14)0.1174
α 0.75 (oracle-GMM)0.0828 (1.00)0.0676 (1.00)0.0212
α 0.75 (DML-IVQR)0.0930 (1.12)0.0741 (1.09)0.0226
α 0.90 (full-GMM)0.4320 (3.10)0.3681 (3.45)0.3495
α 0.90 (oracle-GMM)0.1391 (1.00)0.1067 (1.00)0.0667
α 0.90 (DML-IVQR)0.1649 (1.18)0.1231 (1.15)0.0731
The full-GMM uses 100 control variables without regularization. The oracle-GMM uses the ten relevant variables. DML-IVQR is a double machine learning procedure. α τ denotes the quantile treatment effect. The numbers in parentheses are the ratios of the RMSE or MAE of any estimator to those of the oracle-GMM.
Table 3. Choice of λ : Double machine learning (DML)-IVQR with High-dimensional Controls.
Table 3. Choice of λ : Double machine learning (DML)-IVQR with High-dimensional Controls.
n = 500n = 1000
RMSEMAEBIASRMSEMAEBIAS
α 0.25 ( λ = Belloni and Chernozhukov)0.17160.1325−0.07160.08490.06830.0056
α 0.25 ( λ = 5-fold Cross-Validation)0.17200.1368−0.09860.09950.0811−0.0589
α 0.50 ( λ = Belloni and Chernozhukov)0.12730.09620.02700.08000.05560.0384
α 0.50 ( λ = 5-fold Cross-Validation)0.13740.1032−0.03840.07790.0536−0.0236
α 0.75 ( λ = Belloni and Chernozhukov)0.15720.12720.08760.11420.09610.0839
α 0.75 ( λ = 5-fold Cross-Validation)0.15260.11790.02860.08380.06770.0205
Table 4. Cross-fitted DML-IVQR with High-dimensional Controls.
Table 4. Cross-fitted DML-IVQR with High-dimensional Controls.
n = 500n = 1000
RMSEMAEBIASRMSEMAEBIAS
α 0.25 DML-IVQR0.15710.12380.01810.09540.07540.0415
α 0.25 cross-fitted DML-IVQR0.21650.1745−0.11840.11300.0896−0.0202
α 0.50 DML-IVQR0.13160.10240.07040.09650.07240.0632
α 0.50 cross-fitted DML-IVQR0.14360.11550.04840.10380.08550.0629
α 0.75 DML-IVQR0.17350.14570.12800.12800.11050.1016
α 0.75 cross-fitted DML-IVQR0.20980.18020.17260.17070.15170.1502
Table 5. Estimations with the Model Specification as in Chernozhukov and Hansen (2004).
Table 5. Estimations with the Model Specification as in Chernozhukov and Hansen (2004).
Quantiles0.10.150.250.50.750.850.9
TW(IVQR)44005300490067008000830010,800
TW(res-GMM)4400510049006300820075009100
TW(GMM)4400520048006300840080008700
NFTA(IVQR)360036003700570013,20015,80017,700
NFTA(res-GMM)350036003700560013,90015,80017,700
NFTA(GMM)350036003700570013,90016,10018,200
Table 6. DML-IVQR with High-dimensional Controls.
Table 6. DML-IVQR with High-dimensional Controls.
Quantiles0.10.150.250.50.750.850.9
NFTA(std-DML-IVQR ×63522)317630493303584418,80226,29828,076
TW(std-DML-IVQR ×111529)245330113457769515,05618,73616,394
NFTA(std-DML-IVQR)0.050.0480.0520.0920.2960.4140.442
TW(std-DML-IVQR)0.0220.0270.0310.0690.1350.1680.147
We create 119 technical control variables including those constructed by the polynomial bases, interaction terms, and cubic splines (thresholds). DML-IVQR estimates the distributional effect which signifies an asymmetric pattern similar to that identified in Chernozhukov and Hansen (2004).
Table 7. Total Wealth.
Table 7. Total Wealth.
QuantileSelected Variables
0.15 i r a , e d u c , e d u c 2 , a g e × i r a , a g e × i n c , f s i z e × e d u c , f s i z e × h m o r t
i r a × e d u c , i r a × i n c , h v a l × i n c , m a r r , m a l e , i 4 , a 3
t w o e a r n , m a r r × f s i z e , p i r a × i n c , max ( 0 , a g e 3 0.2 )
max ( 0 , e d u c 2 0.4 ) , max ( 0 , e d u c 0.2 ) , max ( 0 , a g e 2 0.4 )
0.25 i r a , a g e × f s i z e , a g e × i r a , a g e × i n c
f s i z e × e d u c , i r a × e d u c , i r a × i n c
h v a l × i n c , m a r r , m a l e , i 3 , t w o e a r n , m a r r × f s i z e
p i r a × i n c , t w o e a r n × f s i z e , max ( 0 , i n c 0.2 )
0.5 i n c 2 , a g e × f s i z e , a g e × i r a , a g e × i n c
f s i z e × e d u c , i r a × e d u c , i r a × h v a l , i r a × i n c
h v a l × i n c , m a l e , a 1 , a 3 , p i r a × i n c , t w o e a r n × a g e , t w o e a r n × f s i z e
t w o e a r n × h m o r t , t w o e a r n × e d u c , max ( 0 , e d u c 0.6 )
0.75 i n c , i r a , a g e × i r a , a g e × h v a l
a g e × i n c , e d u c × i n c , h v a l × i n c , p i r a × i n c , p i r a × a g e
0.85 i n c , i r a , a g e × h v a l , a g e × i n c , i r a × e d u c
e d u c × i n c , h v a l × i n c , p i r a × i n c , p i r a × h v a l
Selected variables across τ, tuned via cross validation.
Table 8. Net Financial Assets.
Table 8. Net Financial Assets.
QuantileSelected Variables
0.15 i r a , e d u c × 2 , f s i z e × 3 , h v a l × 3 , e d u c × 3 , a g e × e d u c , a g e × h m o r t
a g e × i n c , f s i z e × h m o r t , f s i z e × i n c , i r a × e d u c , i r a × i n c
h v a l × i n c , m a r r , d b , m a l e , i 2 , i 3
i 4 , i 5 , t w o e a r n , m a r r × f s i z e , p i r a × i n c , p i r a × e d u c , t w o e a r n × i n c
t w o e a r n × i r a , max ( 0 , a g e 3 0.2 ) , max ( 0 , a g e 2 0.2 ) , max ( 0 , a g e 0.6 )
max ( 0 , i n c 3 0.2 ) , max ( 0 , i n c 2 0.2 ) , max ( 0 , e d u c 0.2 )
0.25 i r a , h m o r t , a g e × h m o r t , a g e × i n c , f s i z e × h m o r t , f s i z e × i n c
i r a × e d u c , i r a × i n c , h v a l × i n c , d b , s m c o l , m a l e
i 2 , i 3 , i 4 , i 5 , a 2 , a 3 , t w o e a r n , p i r a × i n c , p i r a × a g e
p i r a × f s i z e , t w o e a r n × i n c , t w o e a r n × i r a , t w o e a r n × h m o r t , m a x ( 0 , a g e 2 0.2 )
max ( 0 , a g e 0.6 ) , max ( 0 , i n c 2 0.2 ) , max ( 0 , i n c 0.4 )
max ( 0 , i n c 0.2 ) , max ( 0 , e d u c 0.2 )
0.5 a g e , i r a , a g e × f s i z e , a g e × i r a , a g e × i n c
f s i z e × e d u c , f s i z e × h m o r t , i r a × e d u c , i r a × i n c , h v a l × i n c , h o w n
m a l e , i 3 , i 4 , a 1 , a 2 , a 4 , p i r a × i n c , p i r a × f s i z e , t w o e a r n × i n c , t w o e a r n × f s i z e
t w o e a r n × h m o r t , t w o e a r n × e d u c , max ( 0 , i n c 0.2 )
0.75 i r a , a g e × i n c , h v a l × i n c , p i r a × i n c , p i r a × a g e
0.85 i r a , a g e × i n c , e d u c × i n c , h v a l × i n c , p i r a × i n c
Selected variables across τ, tuned via cross validation.
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Chen, J.-e.; Huang, C.-H.; Tien, J.-J. Debiased/Double Machine Learning for Instrumental Variable Quantile Regressions. Econometrics 2021, 9, 15. https://doi.org/10.3390/econometrics9020015

AMA Style

Chen J-e, Huang C-H, Tien J-J. Debiased/Double Machine Learning for Instrumental Variable Quantile Regressions. Econometrics. 2021; 9(2):15. https://doi.org/10.3390/econometrics9020015

Chicago/Turabian Style

Chen, Jau-er, Chien-Hsun Huang, and Jia-Jyun Tien. 2021. "Debiased/Double Machine Learning for Instrumental Variable Quantile Regressions" Econometrics 9, no. 2: 15. https://doi.org/10.3390/econometrics9020015

APA Style

Chen, J. -e., Huang, C. -H., & Tien, J. -J. (2021). Debiased/Double Machine Learning for Instrumental Variable Quantile Regressions. Econometrics, 9(2), 15. https://doi.org/10.3390/econometrics9020015

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop