Propensity Score Weighting with Mismeasured Covariates: An Application to Two Financial Literacy Interventions

Hao Dong; Daniel L. Millimet

doi:10.3390/jrfm13110290

and

Department of Economics, Southern Methodist University, 3300 Dyer Street, Dallas, TX 75275, USA

^*

Author to whom correspondence should be addressed.

J. Risk Financial Manag.2020, 13(11), 290;https://doi.org/10.3390/jrfm13110290

This article belongs to the Special Issue Nonparametric Econometric Methods and Application II

Version Notes

Order Reprints

Abstract

Estimation of the causal effect of a binary treatment on outcomes often requires conditioning on covariates to address selection concerning observed variables. This is not straightforward when one or more of the covariates are measured with error. Here, we present a new semi-parametric estimator that addresses this issue. In particular, we focus on inverse propensity score weighting estimators when the propensity score is of an unknown functional form and some covariates are subject to classical measurement error. Our proposed solution involves deconvolution kernel estimators of the propensity score and the regression function weighted by a deconvolution kernel density estimator. Simulations and replication of a study examining the impact of two financial literacy interventions on the business practices of entrepreneurs show our estimator to be valuable to empirical researchers.

Keywords:

program evaluation; measurement error; propensity score; unconfoundedness; financial literacy

1. Introduction

Empirical researchers in economics, finance, management, and other disciplines are often interested in the causal effect of a binary treatment on outcomes. In some cases, randomization is used to ensure comparability across the treatment and control groups. However, researchers must rely on observational data when randomization is not feasible. With observational data, concern over the non-random selection of subjects into the treatment group becomes well-founded. Addressing the possibility of non-random selection requires much of the data at hand. Moreover, even with randomization, demands on the data may be non-trivial since randomization only balances covariates across the treatment and control groups in expectation.

In this paper, we consider the case where adjustment for observed covariates is performed to recover an unbiased estimate of the effect of a treatment. Thus, we are restricting ourselves to the case of selection on observed variables. The econometric and statistics literature on the estimation of causal effects in the case of selection on observed variables has grown tremendously of late.1 This has led to the proliferation of statistical methods designed to estimate the causal effect(s) of the treatment, including parametric regression methods, semi- or non-parametric methods based on the propensity score, and combinations thereof.

Despite the growing number of estimation methods, there are only a few that take into account measurement errors in the data. Here, we present a new semi-parametric estimator that partially fills this gap. In particular, we focus on the case when the propensity score is of an unknown functional form and some covariates are subject to classical measurement error. There are two issues to be dealt with to estimate the treatment effect in such a situation: first, we need to estimate the functional form of the propensity score; second, we need to estimate the moment of a known (or estimated) function of mismeasured covariates. The first issue is solved by using deconvolution kernel regression. For the second issue, as the sample analogue is no longer feasible due to the unobservability of the error-free covariates, we consider the integration weighted by deconvolution kernel density estimator.

We illustrate our estimator both via simulation and by revisiting the randomized control trial (RCT) on financial literacy examined in Drexler et al. (2014). In the experiment, micro-entrepreneurs taking out a loan from ADOPEM, a microfinance institution in the Dominican Republic, are randomly assigned to one of three treatment arms to assess the causal effect of financial literacy programs on a firm’s financial practices, objective reporting quality, and business performance. The first treatment provided subjects with standard accounting training. The second treatment provided rule-of-thumb training that covered basic financial heuristics. The final group received neither training and serves as the control group. The authors find significant beneficial effects of the rule-of-thumb training, but not the standard accounting training.

We revisit this study for three reasons. First, proper evaluation of financial literacy interventions is critical. As documented in Lusardi and Mitchell (2014), financial literacy in the US and elsewhere seems woefully inadequate for individuals and small business owners to navigate complex financial matters. Mckenzie and Woodruff (2013, pp. 48–49) offer the following vivid description:

“Walk into a typical micro or small business in a developing country and spend a few minutes talking with the owner, and it often becomes clear that owners are not implementing many of the business practices that are standard in most small businesses in developed countries. Formal records are not kept, and household and business finances are combined. Marketing efforts are sporadic and rudimentary. Some inventory sits on shelves for years at a time, whereas more popular items are frequently out of stock. Few owners have financial targets or goals that they regularly monitor and act to achieve.”

As evidenced in this quote, the lack of financial literacy among micro-entrepreneurs has real consequences. Lusardi and Mitchell (2014) discuss the wider impacts of a lack of financial literacy, such as lower participation in financial markets, poor investment decisions, susceptibility to financial scams, inadequate retirement planning, increased credit card and mortgage debt, etc. See Morgan and Trinh (2019) for a recent example.

While the impacts are well-documented, knowledge of the efficacy of various programs aiming to teach financial literacy is inadequate. Specifically, the causal effect of specific types of financial literacy training interventions is relatively unexplored. Existing research typically lumps all financial literacy programs together, potentially masking insights into what works and what does not. For example, Fernandes et al. (2014) perform a meta-analysis of 201 studies assessing the impact of financial literacy and education programs on financial behaviors, finding that interventions explain only 0.1% of the variance in financial behaviors. By comparing two different training programs, Drexler et al. (2014) represent an important contribution to the literature.

Second, better understanding the determinants of successful microenterprises is critical in lesser developed countries. Berge et al. (2015, p. 707) state: “Microenterprises are an important source of employment, and developing such enterprises is a key policy concern in most countries, and in particular in developing countries where they employ more than half of the labor force.” However, the viability of microenterprises has been found to be heterogenous, as the authors further note that “a growing literature shows that success cannot be taken for granted” (p. 707). Recent research has focused on sources of this heterogeneity, finding that it is not explained fully by variation in capital (Bruhn et al. 2018). The study by Drexler et al. (2014) addresses this issue by exploring the impact of different types of financial literacy training on firm success.

Finally, our proposed estimator is well-suited to the application. To start, despite training being randomly assigned, the authors control (via regression) for several covariates to increase the precision of the treatment effect estimates. Moreover, one covariate is continuous and potentially suffers from classical measurement error. This covariate reflects the size of the loan received by the entrepreneur. While this variable is unlikely to be mismeasured as it is obtained from bank records, arguably the ‘true’ covariate of interest is a measure of capital investment in the firm by the entrepreneur. This could be below the official size of the loan due to some funds being diverted to non-business use, or above the official size of the loan due to other funds being used to supplement the loan. As Drexler et al. (2014, p. 2) note, “for microenterprises the boundary between business and personal financial decisions is often blurred.”

Applying our proposed estmator, we find the results in Drexler et al. (2014) to be generally robust to ‘modest’ amounts of measurement error. However, for a few outcomes, the magnitude of the estimated treatment effect changes. With greater amounts of measurement error, the results are not surprisingly less robust. Typically in such cases we find larger point estimates once measurement error is addressed.

The remainder of the paper is organized as follow. Section 2 provides a brief overview of the literature on measurement error in covariates. Section 3 provides an overview of the potential outcomes framework, discusses identification with and without measurement error in covariates, and presents our proposed estimator. Section 4 studies the small sample properties of the proposed estimators by simulation. Section 5 contains our application to assessment of two financial literacy interventions. Section 6 concludes.

2. Measurement Error in Covariates

A small literature has considered measurement error in an observed covariate when estimating the causal effect of a treatment in the case of selection on observed variables. In a regression context with classical measurement error, it is well known that the Ordinary Least Squares (OLS) estimate of the coefficient on the mismeasured regressor suffers from attenuation bias (see, e.g., Frisch 1934; Koopmans 1937; Reiersol 1950). However, bias will also impact the estimated treatment effect if treatment assignment is correlated with the true value of the mismeasured covariate (Bound et al. 2001). The sign of this covariance determines the sign of the bias. If the measurement error is correlated with treatment assignment (i.e., it is nonclassical), then the direction of the bias depends on whether the partial correlation between the measurement error and treatment assignment is positive or negative (Bound et al. 2001). Finally, if multiple covariates suffer from measurement error, then one is typically unable to sign the bias even under classical measurement error (Bound et al. 2001).

With classical measurement error, a consistent estimate of the treatment effect can be recovered using Instrumental Variable (IV) estimation, where the mismeasured covariate(s) are instrumented for using valid exclusion restrictions. However, this solution places further demands on the data as valid instruments must be available. As an aside, it is also important to realize that the estimated treatment effect will still be inconsistent if treatment assignment is correlated with the measurement error (Bound et al. 2001).

Beyond the regression context, several recent papers consider the effect of measurement error in one or more covariates when relying on semi- or non-parametric estimators of the treatment effect. Battistin and Chesher (2014), extending early work in Cochran and Rubin (1973), focus on the bias of treatment effect parameters estimated using semiparametric (propensity score) methods. The bias, which may be positive or negative, is a function of the measurement error variance. The authors consider bias-corrected estimators where the bias is estimated under different assumptions concerning the reliability of the data.

McCaffrey et al. (2013) develops a consistent inverse propensity score weighted estimator for the case when covariates are mismeasured. In particular, the authors consider a weight function of mismeasured covariates whose conditional expectation given the correctly measured covariates equals the error-free inverse propensity score. Their estimator is then constructed based on approximating the weight function by projecting the inverse of the estimated propensity score onto a set of basis functions. To estimate the propensity score with mismeasured covariates, knowledge of the measurement error distribution is generally needed. It is worth noting that the measurement error considered in this paper could be non-classical, as only conditional independence between the measurement error and the outcome and the treatment given the correctly measured covariates is required. As a cost to this extra flexibility, the authors only establish consistency; further characterization of the asymptotic properties are left as a gap to be filled.

Jakubowski (2010) assesses the performance of propensity score matching when an unobserved covariate is proxied by several variables. The author considers two estimation methods. The first is a propensity score matching estimator where the propensity score model includes the proxy variables. The second is also a propensity score matching model except now the propensity score model includes an estimate of the unobserved covariate obtained via a factor analysis approach.

Webb-Vargas et al. (2017) examines the performance of inverse propensity score weighting with a mismeasured covariate. The authors then consider an inverse propensity score weighting estimator that replaces the mismeasured covariate with multiple imputations. The imputations make use of an auxiliary data source that contains both the true covariate and the mismeasured covariate. Each imputation leads to a unique propensity score model and hence a distinct estimate of the treatment effect. These multiple estimates are then combined into a final estimate.

Rudolph and Stuart (2018) assess the performance of three approaches to deal with measurement error in covariates when applying propensity score estimators. The first approach is propensity score calibration which, similar to Webb-Vargas et al. (2017), relies on an auxialiary data source that contains both the true covariate and the mismeasured covariate. The second approach is a bias-corrected technique based on the fomulas derived in VanderWeele and Arah (2011). As these bias formulas depend on various unknown sensitivity parameters, this technique relies on either external data to make educated guesses concerning the values of these parameters or sensitivity analysis using a grid of plausible values. The final approach is similar and relies on the sensitivity (to unobserved confounders) approach of Rosenbaum (2010) for matched pairs.

Hong et al. (2019) perform an extensive simulation exercise to explore the impact of multiple mismeasured covariates with and without correlated measurement errors. The authors find that correlation in the measurement errors can exacerbate the bias and that including auxiliary variables that are correlated with the true values of the mismeasured covariates may help mitigate the bias.

In sum, it is now well known that measurement error in covariates that belong in the propensity score model introduces bias in the estimated treatment effect. While a few solutions have been proposed, these solutions have not completely solved the problem. Some solutions rely on auxiliary data that contain both the true and mismeasured covariates. Other solutions are based upon bias-corrected estimates requiring the specification of parameter values whose true values are typically unknown. Finally, some solutions are based on trying to reduce the bias through the use of multiple proxies or assessing how severe the measurement error would have to be to explain the treatment effect ignoring measurement error.

Compared to most of these existing works, our estimator has the advantage of not requiring a specific functional form of the propensity score so that it can avoid the bias caused by potential model misspecification. McCaffrey et al. (2013) is an exception as it also treat the propensity as a nonparametric object. However, instead of the consistency established in McCaffrey et al. (2013), our estimator allows us to further characterize the differences in the convergence rates when the measurement errors are of different smoothness.

For the technical aspects, this paper contributes to the vast literature on estimating the non-/semi-parametric measurement error models using deconvolution. See books by Meister (2009) and Horowitz (2009) and surveys by Chen et al. (2011) and Schennach (2016) for reviews. This literature started with the density estimation; See Carroll and Hall (1988), Stefanski and Carroll (1990), Fan (1991a, 1991b), Bissantz et al. (2007), Van Es et al. (2008), Lounici and Nickl (2011) among others. The deconvolution approach used to estimate the density later extend to the estimation of regression function; See Fan and Truong (1993), Fan and Masry (1992), Delaigle and Meister (2007), Delaigle et al. (2009) and Delaigle et al. (2015). Like works in other semi-parametric setups (Fan 1995; Dong et al. 2020b), our estimator is constructed using the deconvolution kernel estimators of both the density and the regression function as the building blocks.

3. Empirics

3.1. Potential Outcomes Framework

Our analysis is couched within the potential outcomes framework (see, e.g., Neyman 1923; Fisher 1935; Roy 1951; Rubin 1974). We consider a random sample of N individuals from a large population, where individuals are indexed by

j = 1, \dots, n

. Define

Y_{j} (T)

to be the potential outcome of individual j under treatment T,

T \in T

.2 In this paper, we limit ourselves to binary treatments:

T = {0, 1}

. The causal effect of the treatment for a given individual is defined as the individual’s potential outcome under treatment (

T = 1

) relative to the individual’s potential outcome under control (

T = 0

). Formally,

τ_{j} = Y_{j} (1) - Y_{j} (0) .

(1)

In the evaluation literature, several population parameters are of potential interest. Here, attention is given to the average treatment effect (ATE)

τ = E [τ_{j}] = E [Y_{j} (1) - Y_{j} (0)]

and the average treatment effect for the treated (ATT)

τ_{treat} = E [τ_{j} | T = 1] = E [Y_{j} (1) - Y_{j} (0) | T = 1] .

The ATE is the expected treatment effect of an observation chosen at random from the population, whereas the ATT is the expected treatment effect of an observation chosen at random from the treatment group.

Each observation is characterized by the quadruple

{Y_{j}, T_{j}, X_{j}, Z_{j}}

, where

Y_{j}

is the observed outcome,

T_{j}

is a binary indicator of the treatment received,

X_{j}

is a scalar covariate, and

Z_{j}

is a d-dimensional vector of covariates. The covariates included in

X_{j}

and

Z_{j}

must be pre-determined (i.e., they are not affected by

T_{j}

) and must not perfectly predict treatment assignment. The observed outcomes is

Y_{j} = T_{j} Y_{j} (1) + (1 - T_{j}) Y_{j} (0)

(2)

which makes clear that only one potential outcome is observed for any individual. Absent randomization,

τ

and

τ_{treat}

are not identified in general due to the selection problem, that is the distribution of

(Y (0), Y (1))

may depend on T. Even with randomization, the efficiency of estimates can be improved by incorporating the covariates.

3.2. Strong Ignorability

To overcome the selection problem, or to improve the efficiency of estimates obtained under randomization, a set of fully observed covariates are commonly assumed, conditional on which

(Y (0), Y (1))

and T are independent. This is referred to as the conditional independence or unconfoundedness assumption (Rubin 1974; Heckman and Robb 1985). Formally, this assumption is expressed as

Assumption 1.

(Y (0), Y (1)) ⊥ T | (X, Z)

.

In addition to Assumption 1, the following overlap or common support assumption concerning the joint distribution of treatment assignment and covariates is also needed. Let

p_{X, Z} (x, z) = P (T = 1 | X = z, Z = z)

denote the propensity score, and

X

and

Z

denote supports of X and Z respectively.

Assumption 2.

0 < p_{X, Z} (x, z) < 1

for all

(x, z) \in X \times Z

.

Assumptions 1 and 2 are jointly referred to as strong ignorability in Rosenbaum and Rubin (1983) and lead to the following well known result

\begin{matrix} τ & = & E [\frac{(T - p_{X, Z} (X, Z)) Y}{p_{X, Z} (X, Z) (1 - p_{X, Z} (X, Z))}] \end{matrix}

(3)

\begin{matrix} τ_{treat} & = & E [\frac{(T - p_{X, Z} (X, Z)) Y}{p (1 - p_{X, Z} (X, Z))}], \end{matrix}

(4)

where

p = P (T = 1)

is the probability of getting treated; See Proposition 18.3 of Wooldridge (2010). Thus, strong ignorability is sufficient to identify the estimands,

τ

and

τ_{treat}

, when all variables are accurately measured.

3.3. Strong Ignorability with Measurement Error

Consider the case where Assumptions 1 and 2 continue to hold, but the quadruple

{Y_{j}, T_{j}, W_{j}, Z_{j}}

is observed by the researcher instead of

{Y_{j}, T_{j}, X_{j}, Z_{j}}

. Here, the observed scalar,

W_{j}

, is assumed to be a noisy measure of

X_{j}

, generated by

W_{j} = X_{j} + ϵ_{j},

where

ϵ_{j}

is measurement error. Let

f_{V}

denote the density of a random variable V and

f^{ft} (t) = \int e^{i t x} f (x) d x

denote the Fourier transform of a function f with

i = \sqrt{- 1}

. To identify

τ

and

τ_{treat}

in the presence of contaminated data, we impose the following assumption in addition to strong ignorability.

Assumption 3.

ϵ ⊥ (Y, T, X, Z)

,

f_{ϵ}

is known, and

f_{ϵ}^{ft}

vanishes nowhere.

Assumption 3 requires the measurement error to be classical. Although this is somewhat restrictive, it is worth noting that this setup is consistent with multiplicative measurement error of the form

W = X ϵ

, as this can be transformed to an additive structure by taking the natural logarithm. In fact, as argued in Schennach (2019), we do not need full independence; only

f_{W}^{ft} (t) = f_{X}^{ft} (t) f_{ϵ}^{ft} (t)

for all

t \in R

is necessary, which is as equally strong as a conditional mean restriction. The assumption of a known error distribution is unlikely to hold in practice, but is imposed here for simplicity. We discuss the relaxation of this assumption when auxiliary information is available, such as the repeated measurements of X, in Section 3.5.

The identification result in the presence of contaminated data is given in the following theorem.

Theorem 1.

Under Assumptions 1–3, τ and

τ_{treat}

are identified from

{Y, T, W, Z}

.

The intuition behind Theorem 1 is straightforward. Based on (3) and (4), to identify

τ

and

τ_{treat}

, it is sufficient to identify

f_{Y, X, Z | T}

, which follows by implementing the convolution theorem to

f_{Y, W, Z | T}

under Assumption 3.

Theorem 1 in McCaffrey et al. (2013) provides results that be used to achieve the point identification of

τ

and

τ_{treat}

under similar assumptions. In particular, using their Theorem 1,

τ

and

τ_{treat}

can be identified by (3) and (4) if the inverse propensity score is replaced by a non-stochastic function A of W and Z whose conditional expectation given X and Z equals

1 / p_{X, Z}

, i.e.,

E [A (W, Z) | X, Z] = 1 / p_{X, Z} (X, Z)

, and A is needed to finally pin down

τ

and

τ_{treat}

. McCaffrey et al. (2013), however, do not provide further details on A except in very special cases.

Under Assumptions 1–3, where Assumption 3 is slightly stronger than Assumption 1 in McCaffrey et al. (2013), we can derive a general explicit form of their function A. For example, for

E [Y (1)]

, which is needed to construct

τ

, the function A is given by

A (w, z) = \frac{p \int e^{- i t w} \frac{{f_{W, Z | T = 1} (\cdot, z)}^{ft} (t)}{| f_{ϵ}^{ft} {(t) |}^{2}} d t}{2 π f_{W, Z} (w, z)},

(5)

where

{f_{W, Z | T = 1} (\cdot, z)}^{ft} (t) = \int e^{i t w} f_{W, Z | T = 1} (w, z) d w

. While it is not easy to give an intuitive interpretation of (5), using the law of iterated expectation, the result shown in Appendix A.2 implies that (5) is the equivalent quantity of the inverse propensity score in the contaminated case. As can be seen, the functional form of A depends on

f_{W, Z | T}

and

f_{ϵ}

. The former,

f_{W, Z | T}

, is identified as

{T, W, Z}

are directly observed, but extra knowledge on the latter,

f_{ϵ}

, is needed to identify A, which echos the known error distribution part of Assumption 3.

In fact, the functional form of A not only matters to the identification of

τ

and

τ_{treat}

, but also matters to the convergence rates of estimators of

τ

and

τ_{treat}

. In particular, as will be seen in Section 3.4, varying the smoothness of

f_{ϵ}

(implying

f_{ϵ}^{ft}

decays to zero at different rates as

t \to \infty

) alters the convergence rate. Intuitively, as

f_{ϵ}^{ft}

appears in the denominator of A, even if the same estimator of

f_{W, Z | T}

is used to construct the estimator of A and then the estimators of

τ

and

τ_{treat}

, due to the integration, the resulting estimators of

τ

and

τ_{treat}

should converge at different speeds if

f_{ϵ}^{ft}

decays to zero at different rates.

3.4. Estimation

If we directly observe X,

τ

and

τ_{treat}

can be estimated by

\begin{matrix} \overset{ˇ}{τ} & = & \frac{1}{n} \sum_{j = 1}^{n} \frac{(T_{j} - {\overset{ˇ}{p}}_{X, Z} (X_{j}, Z_{j})) Y_{j}}{(1 - {\overset{ˇ}{p}}_{X, Z} (X_{j}, Z_{j})) {\overset{ˇ}{p}}_{X, Z} (X_{j}, Z_{j})} \\ {\overset{ˇ}{τ}}_{treat} & = & \frac{1}{n} \sum_{j = 1}^{n} \frac{(T_{j} - {\overset{ˇ}{p}}_{X, Z} (X_{j}, Z_{j})) Y_{j}}{\hat{p} {\overset{ˇ}{p}}_{X, Z} (X_{j}, Z_{j})}, \end{matrix}

where

\hat{p} = \frac{1}{n} \sum_{j = 1}^{n} T_{j}

and

{\overset{ˇ}{p}}_{X, Z}

is a nonparametric estimator of the propensity score,

p_{X, Z}

. These are known as the inverse propensity score weighting (IPW) estimators; see Horvitz and Thompson (1952). However, this estimator is no longer feasible when X is unobserved due to measurement error. To overcome this, note that we can alternatively express

τ

and

τ_{treat}

as

\begin{matrix} τ & = \int \int \int \{\frac{p f_{Y, X, Z | T = 1} (y, x, z)}{p_{X, Z} (x, z)} - \frac{(1 - p) f_{Y, X, Z | T = 0} (y, x, z)}{1 - p_{X, Z} (x, z)}\} y d y d x d z, \end{matrix}

(6)

\begin{matrix} τ_{treat} & = \int \int \int \{f_{Y, X, Z | T = 1} (y, x, z) - \frac{(1 - p) p_{X, Z} (x, z) f_{Y, X, Z | T = 0} (y, x, z)}{p (1 - p_{X, Z} (x, z))}\} y d y d x d z, \end{matrix}

(7)

Derivation of (6) and (7) are discussed in Appendix A.1. To keep the notation simple, we will focus on the case when both X and Z are scalar for the rest of this section. By applying the deconvolution method with

f_{ϵ}

known and given the i.i.d. sample

{Y_{j}, T_{j}, W_{j}, Z_{j}}_{j = 1}^{n}

of

(Y, T, W, Z)

, the conditional densities

f_{Y, X, Z | T = 1} (y, x, z)

and

f_{Y, X, Z | T = 0} (y, x, z)

can be estimated by

\begin{matrix} {\tilde{f}}_{Y, X, Z | T = 1} (x, y, z) & = \frac{b_{n}^{- 3} \sum_{j = 1}^{n} T_{j} K (\frac{y - Y_{j}}{b_{n}}) K (\frac{x - W_{j}}{b_{n}}) K (\frac{z - Z_{j}}{b_{n}})}{\sum_{j = 1}^{n} T_{j}}, \end{matrix}

(8)

\begin{matrix} {\tilde{f}}_{Y, X, Z | T = 0} (x, y, z) & = \frac{b_{n}^{- 3} \sum_{j = 1}^{n} (1 - T_{j}) K (\frac{y - Y_{j}}{b_{n}}) K (\frac{x - W_{j}}{b_{n}}) K (\frac{z - Z_{j}}{b_{n}})}{n - \sum_{j = 1}^{n} T_{j}}, \end{matrix}

(9)

and the propensity score

p_{X, Z} (x, z)

can be estimated by

{\tilde{p}}_{X, Z} (x, z) = \frac{\sum_{j = 1}^{n} T_{j} K (\frac{x - W_{j}}{b_{n}}) K (\frac{z - Z_{j}}{b_{n}})}{\sum_{j = 1}^{n} K (\frac{x - W_{j}}{b_{n}}) K (\frac{z - Z_{j}}{b_{n}})},

(10)

where

b_{n}

is a bandwidth, K is a (ordinary) kernel function, and

K

is a deconvolution kernel function defined as

K (x) = \frac{1}{2 π} \int e^{- i t x} \frac{K^{ft} (t)}{f_{ϵ}^{ft} (t / b_{n})} d t .

Plugging (8), (9), and (10) into (6) and (7), we obtain estimators of

τ

and

τ_{treat}

as follow.

\begin{matrix} \tilde{τ} & = \int_{X} \int_{Z} \{\frac{{\tilde{q}}_{1, 1} (x, z)}{{\tilde{q}}_{0, 1} (x, z)} - \frac{{\tilde{q}}_{1, 0} (x, z)}{{\tilde{q}}_{0, 0} (x, z)}\} \tilde{q} (x, z) d x d z, \end{matrix}

(11)

\begin{matrix} {\tilde{τ}}_{treat} & = \int_{X} \int_{Z} \{{\tilde{q}}_{1, 1} (x, z) - \frac{{\tilde{q}}_{1, 0} (x, z) {\tilde{q}}_{0, 1} (x, z)}{{\tilde{q}}_{0, 0} (x, z)}\} \frac{1}{\hat{p}} d x d z \end{matrix}

(12)

where

X

and

Z

separately denote the support of X and Z, and

\begin{matrix} \tilde{q} (x, z) & = \frac{1}{n b_{n}^{2}} \sum_{j = 1}^{n} K (\frac{x - W_{j}}{b_{n}}) K (\frac{z - Z_{j}}{b_{n}}), \\ {\tilde{q}}_{k, s} (x, z) & = \frac{1}{n b_{n}^{2}} \sum_{j = 1}^{n} Y_{j}^{k} T_{j}^{s} {(1 - T_{j})}^{1 - s} K (\frac{x - W_{j}}{b_{n}}) K (\frac{z - Z_{j}}{b_{n}}) for k, s = 0, 1 . \end{matrix}

Derivation of (11) and (12) are left to Appendix A.3.

Remark 1

(Case of vector X and Z). The proposed method can be easily generalized to case when X and Z are vector even though it is constructed based on the case when both X and Z are scalar. In particular, let

X = (X_{1}, \dots, X_{d_{x}})

and

Z = (Z_{1}, \dots, Z_{d_{z}})

be

d_{x}

- and

d_{z}

-dimensional vectors, respectively, and

W = (W_{1}, \dots, W_{d_{x}})

a

d_{x}

-dimensional vector of noisy measures of X generated by

W_{d} = X_{d} + ϵ_{d}

for

d = 1, \dots, d_{x}

. Following a simiar route as when X and Z are both scalar, we can estimate τ and

τ_{treat}

by

\begin{matrix} {\tilde{τ}}^{'} & = \int_{X_{1}} \dots \int_{X_{d x}} \int_{Z_{1}} \dots \int_{Z_{d z}} \{\frac{{\tilde{q}}_{1, 1}^{'} (x, z)}{{\tilde{q}}_{0, 1}^{'} (x, z)} - \frac{{\tilde{q}}_{1, 0}^{'} (x, z)}{{\tilde{q}}_{0, 0}^{'} (x, z)}\} {\tilde{q}}^{'} (x, z) d x d z, \\ {\tilde{τ}}_{treat}^{'} & = \int_{X_{1}} \dots \int_{X_{d x}} \int_{Z_{1}} \dots \int_{Z_{d z}} \{{\tilde{q}}_{1, 1}^{'} (x, z) - \frac{{\tilde{q}}_{1, 0}^{'} (x, z) {\tilde{q}}_{0, 1}^{'} (x, z)}{{\tilde{q}}_{0, 0}^{'} (x, z)}\} \frac{1}{\hat{p}} d x d z, \end{matrix}

where

X_{d_{1}}

and

Z_{d_{2}}

separately denote the support of

X_{d_{1}}

and

Z_{d_{2}}

for

d_{1} = 1, \dots, d_{x}

and

d_{2} = 1, \dots, d_{z}

,

x = (x_{1}, \dots, x_{d_{x}})

,

z = (z_{1}, \dots, z_{d_{z}})

, and

\begin{matrix} {\tilde{q}}^{'} (x, z) & = \frac{1}{n b_{n}^{d_{x} + d_{z}}} \sum_{j = 1}^{n} \prod_{d = 1}^{d_{x}} K_{d} (\frac{x_{d} - W_{d, j}}{b_{n}}) \prod_{j = 1}^{d_{z}} K (\frac{z_{d} - Z_{d, j}}{b_{n}}), \\ {\tilde{q}}_{k, s}^{'} (x, z) & = \frac{1}{n b_{n}^{d_{x} + d_{z}}} \sum_{j = 1}^{n} Y_{j}^{k} T_{j}^{s} {(1 - T_{j})}^{1 - s} \prod_{d = 1}^{d_{x}} K_{d} (\frac{x_{d} - W_{d, j}}{b_{n}}) \prod_{j = 1}^{d_{z}} K (\frac{z_{d} - Z_{d, j}}{b_{n}}) for k, s = 0, 1, \end{matrix}

with

K_{d} (x) = \frac{1}{2 π} \int e^{- i t x} \frac{K^{ft} (t)}{f_{ϵ_{d}}^{ft} (t / b_{n})} d t for d = 1, \dots, d_{x} .

We conjecture that analogous results to our main theorems can be established for the multivariate case.

To derive the convergence rates of

\tilde{τ}

and

{\tilde{τ}}_{treat}

, we need following conditions.

Assumption 4.

(i): ${Y_{j}, T_{j}, W_{j}, Z_{j}}_{j = 1}^{n}$ is an i.i.d. sample of $(Y, T, W, Z)$ . $f_{X, Z}$ and $E [Y (s) | X, Z]$ are bounded away from zero, and $f_{X, Z}$ and $E [Y^{2} (s) | X, Z]$ are bounded for $s = 0, 1$ over compact support $X \times Z$ .
(ii): $f_{X, Z}$ , $p_{X, Z}$ , and $E [Y (s) | X, Z]$ for $s = 0, 1$ are γ-times continuously differentiable with bounded and integrable derivatives for some positive integer γ.
(iii): K is differentiable to order γ and satisfies

$\int K (u) d u = 1, \int u^{p + 1} K (u) d u \neq 0, \int u^{l} K (u) d u = 0 for l = 1, 2, \dots, γ .$

Also, $K^{ft}$ is compactly supported on $[- 1, 1]$ , symmetric around zero, and bounded.
(iv): $b_{n} \to 0$ and $n b_{n} {({inf}_{| t | \leq b_{n}^{- 1}} | f_{ϵ}^{ft} (t) |)}^{2} \to \infty$ as $n \to \infty$ .

Assumption 4(i) requires the random sampling and the regularity of densities and conditional moments. Assumption 4(ii) imposes smoothness restrictions on the densities and conditional moments, which are needed to control the magnitude of bias in the estimation together with the properties of kernel function K as imposed in Assumption 4(iii). In addition to the standard properties of a high-order kernel function, Assumption 4(iii) also requires

K^{ft}

to be compactly supported, which is commonly used in deconvolution problems to truncate the ill-behaved tails of the integrand for regularization purposes. Meister (2009) discusses how kernels of any order can be constructed quite simply. Assumption 4(iv) imposes mild bandwidth restrictions. In particular, it simply requires that the bandwidth must decay to zero as the sample size grows, but should not decay too fast. The second part of Assumption 4(iv) is needed so that the higher order components of the estimation error are asymptotically negligible.

Theorem 2.

Under Assumptions 1–4, it holds

\begin{matrix} | \tilde{τ} - τ | & = O_{p} (n^{- 1 / 2} b_{n}^{- 3 / 2} {(inf_{| t | \leq b_{n}^{- 1}} | f_{ϵ}^{ft} (t) |)}^{- 1} + b_{n}^{γ}), \\ | {\tilde{τ}}_{treat} - τ_{treat} | & = O_{p} (n^{- 1 / 2} b_{n}^{- 3 / 2} {(inf_{| t | \leq b_{n}^{- 1}} | f_{ϵ}^{ft} (t) |)}^{- 1} + b_{n}^{γ}) . \end{matrix}

Theorem 2 presents the convergence rates of

\tilde{τ}

and

{\tilde{τ}}_{treat}

. The second term

b_{n}^{γ}

in the convergence rate characterizes the magnitude of the estimation bias, which is identical to the error-free case. The first term characterizes the magnitude of the estimation variance. Compared to the error-free case, the estimation variance of

\tilde{τ}

and

{\tilde{τ}}_{treat}

decays slower due to the extra term

b_{n}^{- 1 / 2} {({inf}_{| t | \leq b_{n}^{- 1}} | f_{ϵ}^{ft} (t) |)}^{- 1}

. In particular, the smoother is the error distribution, the larger will be the estimation variance and, hence, the slower will be the convergence rate.

As is typical in the nonparametric measurement literature, to further specify the convergence rates of

\tilde{τ}

and

{\tilde{τ}}_{treat}

, we consider two separate cases characterized by different smoothness of the measurement error: the ordinary smooth case and the supersmooth case. For the ordinary smooth case, the error characteristic function decays at a polynomial rate. In particular, we impose the following condition.

Assumption 5.

There exist positive constants α and

c_{0}^{os} \leq c_{1}^{os}

such that

c_{0}^{os} {(1 + | t |)}^{- α} \leq | f_{ϵ}^{ft} (t) | \leq c_{1}^{os} {(1 + | t |)}^{- α} for all t \in R .

If

f_{ϵ}

satisfies Assumption 5, we say that it is ordinary smooth of order

α

. Popular examples of ordinary smooth densities include the Laplace and gamma density. The convergence rates of

\tilde{τ}

and

{\tilde{τ}}_{treat}

in the presence of ordinary smooth error of order

α

are specified as follow.

Corollary 1.

Under Assumptions 1–4, if Assumption 5 holds true, we have

\begin{matrix} | \tilde{τ} - τ | & = O_{p} (n^{- 1 / 2} b_{n}^{- (3 / 2 + α)} + b_{n}^{γ}), \\ | {\tilde{τ}}_{treat} - τ_{treat} | & = O_{p} (n^{- 1 / 2} b_{n}^{- (3 / 2 + α)} + b_{n}^{γ}) . \end{matrix}

Corollary 1 shows that

\tilde{τ}

and

{\tilde{τ}}_{treat}

converge in a polynomial rate

n^{- r}

for some constant

r > 0

. The value of r depends on the choice of the bandwidth

b_{n}

, which will be discussed in Section 4.

For the supersmooth case, the error characteristic function decays at an exponential rate. In particular, we impose the following condition.

Assumption 6.

There exist positive constants β,

β_{0}

, and

c_{0}^{ss} \leq c_{1}^{ss}

such that

c_{0}^{ss} e^{- β_{0} {| t |}^{β}} \leq | f_{ϵ}^{ft} (t) | \leq c_{1}^{ss} e^{- β_{0} {| t |}^{β}} for all t \in R .

If

f_{ϵ}

satisfies Assumption 6, we say that it is supersmooth of order

β

. Popular examples of supersmooth densities include the Cauchy and Gaussian density. The convergence rates of

\tilde{τ}

and

{\tilde{τ}}_{treat}

in the presence of supersmooth error of order

β

are specified as follow.

Corollary 2.

Under Assumptions 1–4, if Assumption 6 holds true, we have

\begin{matrix} | \tilde{τ} - τ | & = O_{p} (n^{- 1 / 2} b_{n}^{- 3 / 2} e^{β_{0} b_{n}^{- β}} + b_{n}^{γ}), \\ | {\tilde{τ}}_{treat} - τ_{treat} | & = O_{p} (n^{- 1 / 2} b_{n}^{- 3 / 2} e^{β_{0} b_{n}^{- β}} + b_{n}^{γ}) . \end{matrix}

Corollary 2 shows that

\tilde{τ}

and

{\tilde{τ}}_{treat}

can only converge at a logarithm rate, which is much slower than the polynomial rate that has been seen in the ordinary smooth case. In particular, a normal error would make the estimator much more data-demanding than the case with a Laplace error. Again, the specific rate will depends on the choice of the bandwidth

b_{n}

, which will be discussed in Section 4.

3.5. Case of Unknown Measurement Error Distribution

Assuming the measurement error distribution to be fully known is usually unrealistic in practice. Auxiliary information, such as repeated measurements of X, can be used to relax the assumption of a known error distribution imposed in Assumption 3.

Suppose we have two independent noisy measures of X, W and

W^{r}

, determined as follows

\begin{matrix} W_{j} & = & X_{j} + ϵ_{j} \\ W_{j}^{r} & = & X_{j} + ϵ_{j}^{r}, \end{matrix}

for

j = 1, \dots, n

. To identify the distribution of

ϵ

, we impose the following assumption.

Assumption 7.

(ϵ, ϵ^{r})

are mutually independent and independent of

(X, Y, Z, T)

, the distributions of ϵ and

ϵ^{r}

are identical, and

f_{ϵ}

is symmetric around zero.

The error characteristic function,

f_{ϵ}^{ft}

, can be estimated by

{\hat{f}}_{ϵ}^{ft} (t) = {|\frac{1}{n} \sum_{j = 1}^{n} cos {t (W_{j} - W_{j}^{r})}|}^{1 / 2}

under Assumption 7 (Delaigle et al. 2008) 3.

When the measurement error distribution is unknown, we can estimate

τ

and

τ_{treat}

by plugging in this estimator, yielding

\begin{matrix} \hat{τ} & = \int_{X} \int_{Z} \{\frac{{\hat{q}}_{1, 1} (x, z)}{{\hat{q}}_{0, 1} (x, z)} - \frac{{\hat{q}}_{1, 0} (x, z)}{{\hat{q}}_{0, 0} (x, z)}\} \hat{q} (x, z) d x d z, \end{matrix}

(13)

\begin{matrix} {\hat{τ}}_{treat} & = \int_{X} \int_{Z} \{{\hat{q}}_{1, 1} (x, z) - \frac{{\hat{q}}_{1, 0} (x, z) {\hat{q}}_{0, 1} (x, z)}{{\hat{q}}_{0, 0} (x, z)}\} \frac{1}{\hat{p}} d x d z, \end{matrix}

(14)

where

\begin{matrix} \hat{q} (x, z) & = \frac{1}{n b_{n}^{2}} \sum_{j = 1}^{n} \hat{K} (\frac{x - W_{j}}{b_{n}}) K (\frac{z - Z_{j}}{b_{n}}), \\ {\hat{q}}_{k, s} (x, z) & = \frac{1}{n b_{n}^{2}} \sum_{j = 1}^{n} Y_{j}^{k} T_{j}^{s} {(1 - T_{j})}^{1 - s} \hat{K} (\frac{x - W_{j}}{b_{n}}) K (\frac{z - Z_{j}}{b_{n}}) for k, s = 0, 1, \end{matrix}

and the deconvolution kernel function based on estimated error characteristic function defined by

\hat{K} (x) = \frac{1}{2 π} \int e^{- i t x} \frac{K^{ft} (t)}{{\hat{f}}_{ϵ}^{ft} (t / b_{n})} d t .

3.6. Inference

The proposed estimators, both in the case where the error distribution is known and the case where it is not, are constructed based on deconvolution methods. Unfortunately, deconvolution-based inference is known to be extremely challenging. Most of the existing work on the deconvolution-based inference focuses on nonparametric objects, such as the density and regression function. In particular, Bissantz et al. (2007) develops uniform confidence bands for the density of a mismeasured variable when the error distribution is known. Lounici and Nickl (2011) derive upper bounds for the sup-norm risk of a wavelet deconvolution estimator of the density of a mismeasured variable when the error distribution is unknown but repeated measurements are available. The authors construct uniform confidence bands for the density using these bounds. The resulting confidence bands, however, could be conservative as the upper bound of the coverage probability is derived using the concentration inequalities. Kato and Sasaki (2018, 2019) establish uniform confidence bands with asymptotic validity for the density of a mismeasured variable and the regression function of a mismeasured covariate, respectively, in the case when the error distribution is unknown but repeated measurements are available. Inference concerning a function of densities and regression functions under the deconvolution problem is remains an open question in the literature. We leave for future work the examination of bootstrap methods for the construction of confidence intervals of our proposed estimators. In particular, a non-parametric bootstrap as in Bissantz et al. (2007) could be considered for the case when the error distribution is known, and a wild bootstrap as in Kato and Sasaki (2019) could be considered for the case when the error distribution is unknown but repeated measurements are available.

4. Simulation

In this section, we evaluate the finite sample performance of the proposed estimators using Monte Carlo simulation. In particular, we focus on the case with a single covariate for which we can only observe its noisy measurement, and the following data generating process is considered

Y (T) = g (T, X) + U,

where covariate X is drawn from

U [0.5, 1.5]

and is independent of U, the error term U is drawn from

N (0, 1)

and is independent of

(T, X)

, the treatment is assigned according to

P (T = 1 | X) = exp (0.5 - X)

, and three specifications of g are considered

\begin{matrix} DGP 1 : & g (t, x) = t + x, \\ DGP 2 : & g (t, x) = t + x + x^{2} - x^{3}, \\ DGP 3 : & g (t, x) = t + x - sin (x) . \end{matrix}

While X is assumed unobserved, we suppose

W = X + ϵ

and

W^{r} = X + ϵ^{r}

are observed, where

(ϵ, ϵ^{r})

is mutually independent and independent of

(T, X, U)

. For the distributions of

ϵ

and

ϵ^{r}

, we consider two cases. First, as an example of the case of ordinary smooth errors, we consider the case when

(ϵ, ϵ^{r})

have a zero mean Laplace distribution with standard deviation

1 / 3

. Second, as an example of the case of supersmooth errors, we consider the case when

(ϵ, ϵ^{r})

have a normal distribution with zero mean and standard deviation of

1 / 3

.

Throughout the simulation study, we use the kernel function whose Fourier transform is

K^{ft} (t) = \{\begin{matrix} 1 & if | t | \leq 0.05, \\ exp \{\frac{{- exp (- (| t | - 0.05)}^{2})}{{(| t | - 1)}^{2}}\} & if 0.05 < | t | < 1, \\ 0 & if | t | \geq 1 . \end{matrix}

This is the infinite-order flat-top kernel proposed by Mcmurry and Politis (2004), whose Fourier transform is compactly supported and can be used for the regularization purpose in the deconvolution estimation.

A trimming term is used in the denominators

{\hat{q}}_{0, 0}

and

{\hat{q}}_{0, 1}

to ensure the stability. Specifically, all values of the denominators below

0.01

are set to

0.01

. Two sample sizes are considered, n = 250 and 500, and all results are based on 500 Monte Carlo replications. For the choice of the bandwidth, to reduce the computation cost, we apply Delaigle and Gijbels (2004) to the first experiment, and use it for all subsequent simulations. To increase the robustness, we use 2 times the optimal bandwidth suggested by Delaigle and Gijbels (2004) as the deconvolution based estimator is more sensitive to smaller bandwidth; see Dong et al. (2020a).

In Table 1, Table 2 and Table 3, results are reported for both

\tilde{τ}

and

{\tilde{τ}}_{treat}

, the proposed estimators for the case when the error distribution is known and only W is observed, and

\hat{τ}

and

{\hat{τ}}_{treat}

, the proposed estimators for the case when the error distribution is unknown and both W and

W^{r}

are observed. Results are given for the bias (Bias), standard deviation (SD), and the rooted mean squared error (RMSE) of each estimator in different settings.

Table 1. DGP1.

Table 2. DGP2.

Table 3. DGP3.

The results appear encouraging and display several interesting features. First, the estimators have better performance with a larger sample size, and the performance of estimators is better with ordinary smooth error compared to the supersmooth error case. We also note that the performance of the estimators for the case when the error distribution is unknown is close to that of the estimators for the case when the error distribution is known. Using the estimated error distribution generally adds extra noise to the estimation. There are cases when the symmetry of the error distribution may allow the performance of the estimator to be independent of whether the error distribution is estimated or not; see Dong et al. (2020b). Finally, as to be expected, the proposed estimators have similar performance across different data generating processes, which implies that they are robust to unobserved nonlinearity in the conditional expectation function.

5. Application

To illustrate our estimator in practice, we revisit the analysis in Drexler et al. (2014).4 Drexler et al. (2014) examine a randomized control trial (RCT) in which micro-entrepreneurs taking out a loan from ADOPEM, a microfinance institution in the Dominican Republic, are randomly assigned to one of three treatment arms to assess the causal effect of financial literacy programs on a firm’s financial practices, objective reporting quality, and business performance. The first treatment provided subjects with standard accounting training. The second treatment provided rule-of-thumb training that covered basic financial heuristics. The final group received neither training and serves as the control group. The authors find significant beneficial effects of the rule-of-thumb training, but not the standard accounting training.

Our analysis deviates from Drexler et al. (2014) in one main respect. Whereas Drexler et al. (2014) examine both treatments simultaneously using a single regression model estimated by OLS, we do not. As our estimator is based on the IPW estimator, we examine each treatment separately. To do so, we restrict the sample to a single treatment arm along with the control group. Thus, our sample sizes diverge from the original study. Nonetheless, we present OLS estimates for comparison to Drexler et al. (2014) and they are essentially identical.

Our results are presented in Table 4 and Table 5. The only difference across the two tables is the set of outcomes being analyzed.5 In each table, Columns 2 and 5 report the OLS estimates of the treatment effect. These are most directly comparable to Drexler et al. (2014), subject to the caveat mentioned above that we assess each treatment separately. Columns 3 and 6 report IPW estimates of the ATE treating all covariates as correctly measured and estimating the propensity score via logit. Finally, Columns 4 and 7 report the results of our estimator for the ATE, treating the size of the loan as potentially mismeasured. Because the application lacks any auxiliary information on possible measurement error in this covariate, we assume the measurement error is normally distributed with mean zero and three different variance, corresponding to increasing levels of measurement error. Specifically, we set the standard deviation of the measurement error to be 1/6, 1/3, and 2/3 of the standard deviation of the observed loan values. The bandwidth for the observed loan values, as in the simulation experiment, is chosen as two times the optimal bandwidth suggested by Delaigle and Gijbels (2004), and bandwidths for other covariates, which are supposed to be error-free, are chosen based on Li and Racine (2003).

Table 4. Impact of Training on Business Practices and Performance.

Table 5. Impact of Training on Business Practices and Performance.

The results are interesting. In terms of the standard accounting treatment, the results appear robust to modest measurement in loan size. The only outcome for which the treatment effect is statistically significant ignoring measurement error is “setting aside cash for business purposes”. Here, the OLS and IPW estimates are both 0.07. With modest measurement error, our estimator yields a point estimate of 0.08. It is noteworthy, however, that we find even stronger effects as we increase the variance of the measurement error.

In terms of the the rule-of-thumb treatment, the results appear predominantly robust to modest measurement in loan size as well. In Table 4, all OLS and IPW estimates are statistically significant at conventional levels. With modest measurement error, our estimator yields point estimates are qualitatively unchanged; sometimes slightly larger and sometimes slightly smaller in absolute value. As we increase the variance of the measurement error, however, the point estimates generally increase in magnitude. Thus, the economic magnitudes of the treatment effects are sensitive to the degree of measurement error. For example, increasing the standard deviation of the measurement error from 1/6 to 2/3 of the standard deviation of the observed loan values at least doubles the magnitude of the ATE for the outcomes “setting aside cash for business purposes,” “keep accounting records,” “separate business and personal accounting,” and ”calculate revenues formally”.

In Table 5, the only outcomes for which the treatment effect is statistically significant ignoring measurement error are “any reporting errors” and “revenue index”. For the former, our estimator suggests, if anything, a larger ATE in absolute value once measurement error is addressed. For the latter, our estimator suggests a smaller ATE once measurement error is addressed.

6. Conclusions

Estimation of the causal effect of a binary treatment on outcomes, even in the case of selection on observed covariates, can be complicated when one or more of the covariates are measured with error. In this paper, we present a new semi-parametric estimator that addresses this issue. In particular, we focus on the case when the propensity score is of an unknown functional form and some covariates are subject to classical measurement error. Allowing the functional form of the propensity score to be unknown as well as a function of unobserved, error-free covariates, we consider an integration weighted by deconvolution kernel density estimator. Our simulations and replication exercise show our estimator to be valuable to empirical researchers. However, future work is needed to conduct inference with this estimator.

Author Contributions

Conceptualization, H.D. and D.L.M.; methodology, H.D. and D.L.M.; software, H.D. and D.L.M.; validation, H.D. and D.L.M.; formal analysis, H.D.; investigation, D.L.M.; resources, D.L.M.; data curation, H.D. and D.L.M.; writing–original draft preparation, H.D. and D.L.M.; writing–review and editing, H.D. and D.L.M.; visualization, H.D. and D.L.M.; supervision, D.L.M.; project administration, D.L.M.; funding acquisition, H.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Derivation of Equations

Appendix A.1. Derivation of (6) and (7)

(6) follows by

τ = E [Y (1) - Y (0)]

and for

t = 0, 1

,

\begin{matrix} E [Y (t)] & = E [E [Y (t) | X, Z]] = E [E [Y (t) | T = t, X, Z]] = E [E [Y | T = t, X, Z]] \\ = \int \int \int y f_{Y | T = t, X = x, Z = z} (y) d y f_{X, Z} (x, z) d x d z \\ = P (T = t) \int y \{\int \int \frac{f_{Y, X, Z | T = t} (y, x, z) f_{X, Z} (x, z)}{f_{X, Z | T = t} (x, z) P (T = t)} d x d z\} d y \\ = P (T = t) \int \int \int \frac{y f_{Y, X, Z | T = t} (y, x, z)}{P (T = t | X = x, Z = z)} d y d x d z, \end{matrix}

where the second step follows by Assumption 1 and the last step requires Assumption 2. (7) follows by

τ_{treat} = E [Y (1) - Y (0) | T = 1]

,

E [Y (1) | T = 1] = E [Y | T = 1] = E [T Y] / p

, and

\begin{matrix} E [Y (0) | T = 1] & = E [E [Y (0) | T = 1, X, Z] | T = 1] \\ = E [E [Y (0) | T = 0, X, Z] | T = 1] \\ = E [E [Y | T = 0, X, Z] | T = 1] \\ = \int \int \{\int y f_{Y | T = 0, X = x, Z = z} (y) d y\} f_{X, Z | T = 1} (x, z) d x d z \\ = \int \int \int \frac{y f_{X, Z | T = 1} (x, z)}{f_{X, Z | T = 0} (x, z)} f_{Y, X, Z | T = 0} (y, x, z) d y d x d z \\ = E [\frac{Y f_{X, Z | T = 1} (X, Z)}{f_{X, Z | T = 0} (X, Z)}| T = 0] \\ = \frac{1 - p}{p} E [\frac{Y p_{X, Z} (X, Z)}{1 - p_{X, Z} (X, Z)} | T = 0], \end{matrix}

where the first step follows by Assumption 1 and the last step requires Assumption 2.

Appendix A.2. Derivation of (5)

(5) follows by

\begin{matrix} E [Y (1)] & = E [\frac{T Y}{p_{X, Z} (X, Z)}] = p \int \int E [T Y | X = x, Z = z] f_{X, Z | T = 1} (x, z) d x d z \\ = p \int \int \{\frac{1}{2 π} \int e^{- i s x} \frac{{E [T Y | W = \cdot, Z = z]}^{ft} (s)}{f_{ϵ}^{ft} (s)} d s\} \{\frac{1}{2 π} \int e^{- i t x} \frac{{f_{W, Z | T = 1} (\cdot, z)}^{ft} (t)}{f_{ϵ}^{ft} (t)} d t\} d x d z \\ = \frac{p}{2 π} \int \int \int \{\frac{1}{2 π} \int e^{- i (s + t) x} d x\} \frac{{E [T Y | W = \cdot, Z = z]}^{ft} (s) {f_{W, Z | T = 1} (\cdot, z)}^{ft} (t)}{f_{ϵ}^{ft} (s) f_{ϵ}^{ft} (t)} d s d t d z \\ = \frac{p}{2 π} \int \int \frac{{E [T Y | W = \cdot, Z = z]}^{ft} (- t) {f_{W, Z | T = 1} (\cdot, z)}^{ft} (t)}{| f_{ϵ}^{ft} {(t) |}^{2}} d t d z \\ = \frac{p}{2 π} \int \int \{\int E [T Y | W = w, Z = z] e^{- i t w} d w\} \frac{{f_{W, Z | T = 1} (\cdot, z)}^{ft} (t)}{| f_{ϵ}^{ft} {(t) |}^{2}} d t d z \\ = \int \int E [T Y | W = w, Z = z] \{\frac{p \int e^{- i t w} \frac{{f_{W, Z | T = 1} (\cdot, z)}^{ft} (t)}{| f_{ϵ}^{ft} {(t) |}^{2}} d t}{2 π f_{W, Z} (w, z)}\} f_{W, Z} (w, z) d w d z \\ = E [T Y A (W, Z)], \end{matrix}

where the sixth equality follows by

\int δ (x - b) f (x) d x = f (b)

with Dirac delta function

δ (x) = \frac{1}{2 π} \int e^{- i t x} d x

.

Appendix A.3. Derivation of (11) and (12)

(11) follows by

\begin{matrix} \tilde{τ} & = \int \int \int \{\frac{\hat{p} {\tilde{f}}_{Y, X, Z | T = 1} (y, x, z)}{{\tilde{p}}_{X, Z} (x, z)} - \frac{(1 - \hat{p}) {\tilde{f}}_{Y, X, Z | T = 0} (y, x, z)}{1 - {\tilde{p}}_{X, Z} (x, z)}\} y d y d x d z \\ = \frac{1}{n b_{n}^{3}} \sum_{j = 1}^{n} \int \int \{\frac{T_{j} K (\frac{x - W_{j}}{b_{n}}) K (\frac{z - Z_{j}}{b_{n}})}{\frac{\sum_{l = 1}^{n} T_{l} K (\frac{x - W_{l}}{b_{n}}) K (\frac{z - Z_{l}}{b_{n}})}{\sum_{l = 1}^{n} K (\frac{x - W_{l}}{b_{n}}) K (\frac{z - Z_{l}}{b_{n}})}} - \frac{(1 - T_{j}) K (\frac{x - W_{j}}{b_{n}}) K (\frac{z - Z_{j}}{b_{n}})}{\frac{\sum_{l = 1}^{n} (1 - T_{l}) K (\frac{x - W_{l}}{b_{n}}) K (\frac{z - Z_{l}}{b_{n}})}{\sum_{l = 1}^{n} K (\frac{x - W_{l}}{b_{n}}) K (\frac{z - Z_{l}}{b_{n}})}}\} \{\int K (\frac{y - Y_{j}}{b_{n}}) y d y\} d x d z \\ =_{(1)} \frac{1}{n b_{n}^{2}} \sum_{j = 1}^{n} \int \int \{\frac{T_{j} K (\frac{x - W_{j}}{b_{n}}) K (\frac{z - Z_{j}}{b_{n}})}{\frac{\sum_{l = 1}^{n} T_{l} K (\frac{x - W_{l}}{b_{n}}) K (\frac{z - Z_{l}}{b_{n}})}{\sum_{l = 1}^{n} K (\frac{x - W_{l}}{b_{n}}) K (\frac{z - Z_{l}}{b_{n}})}} - \frac{(1 - T_{j}) K (\frac{x - W_{j}}{b_{n}}) K (\frac{z - Z_{j}}{b_{n}})}{\frac{\sum_{l = 1}^{n} (1 - T_{l}) K (\frac{x - W_{l}}{b_{n}}) K (\frac{z - Z_{l}}{b_{n}})}{\sum_{l = 1}^{n} K (\frac{x - W_{l}}{b_{n}}) K (\frac{z - Z_{l}}{b_{n}})}}\} \{\int K (\tilde{y}) (Y_{j} + b_{n} \tilde{y}) d \tilde{y}\} d x d z \\ =_{(2)} \frac{1}{n b_{n}^{2}} \sum_{j = 1}^{n} Y_{j} \int \int \{\frac{T_{j} K (\frac{x - W_{j}}{b_{n}}) K (\frac{z - Z_{j}}{b_{n}})}{\frac{\sum_{l = 1}^{n} T_{l} K (\frac{x - W_{l}}{b_{n}}) K (\frac{z - Z_{l}}{b_{n}})}{\sum_{l = 1}^{n} K (\frac{x - W_{l}}{b_{n}}) K (\frac{z - Z_{l}}{b_{n}})}} - \frac{(1 - T_{j}) K (\frac{x - W_{j}}{b_{n}}) K (\frac{z - Z_{j}}{b_{n}})}{\frac{\sum_{l = 1}^{n} (1 - T_{l}) K (\frac{x - W_{l}}{b_{n}}) K (\frac{z - Z_{l}}{b_{n}})}{\sum_{l = 1}^{n} K (\frac{x - W_{l}}{b_{n}}) K (\frac{z - Z_{l}}{b_{n}})}}\} d x d z \\ = \int \int \{\frac{{\hat{q}}_{1, 1} (x, z)}{{\hat{q}}_{0, 1} (x, z)} - \frac{{\hat{q}}_{1, 0} (x, z)}{{\hat{q}}_{0, 0} (x, z)}\} \hat{q} (x, z) d x d z, \end{matrix}

where (1) follows by change of variables

\tilde{y} = \frac{y - Y_{j}}{b_{n}}

and (2) follows by

\int K (x) d x = 0

and

\int K (x) x d x = 0

.

(12) follows by

\begin{matrix} {\tilde{τ}}_{treat} & = \frac{1}{\hat{p}} \int \int \int \{\hat{p} {\tilde{f}}_{X, Y, Z | T = 1} (x, y, z) - \frac{{\tilde{p}}_{X, Z} (x, z) (1 - \hat{p}) {\tilde{f}}_{X, Y, Z | T = 0} (x, y, z)}{1 - {\tilde{p}}_{X, Z} (x, z)}\} y d x d y d z \\ = \frac{1}{n b_{n}^{3} \hat{p}} \sum_{j = 1}^{n} \int \int \{\begin{matrix} T_{j} K (\frac{x - W_{j}}{b_{n}}) K (\frac{z - Z_{j}}{b_{n}}) \\ - \frac{(1 - T_{j}) K (\frac{x - W_{j}}{b_{n}}) K (\frac{z - Z_{j}}{b_{n}}) \sum_{l = 1}^{n} T_{l} K (\frac{x - W_{l}}{b_{n}}) K (\frac{z - Z_{l}}{b_{n}})}{\sum_{l = 1}^{n} (1 - T_{l}) K (\frac{x - W_{l}}{b_{n}}) K (\frac{z - Z_{l}}{b_{n}})} \end{matrix}\} \{\int K (\frac{y - Y_{j}}{b_{n}}) y d y\} d x d z \\ =_{(1)} \frac{1}{n b_{n}^{2} \hat{p}} \sum_{j = 1}^{n} \int \int \{\begin{matrix} T_{j} K (\frac{x - W_{j}}{b_{n}}) K (\frac{z - Z_{j}}{b_{n}}) \\ - \frac{(1 - T_{j}) K (\frac{x - W_{j}}{b_{n}}) K (\frac{z - Z_{j}}{b_{n}}) \sum_{l = 1}^{n} T_{l} K (\frac{x - W_{l}}{b_{n}}) K (\frac{z - Z_{l}}{b_{n}})}{\sum_{l = 1}^{n} (1 - T_{l}) K (\frac{x - W_{l}}{b_{n}}) K (\frac{z - Z_{l}}{b_{n}})} \end{matrix}\} \{\int K (\tilde{y}) (Y_{j} + b_{n} \tilde{y}) d \tilde{y}\} d x d z \\ =_{(2)} \frac{1}{n b_{n}^{2} \hat{p}} \sum_{j = 1}^{n} Y_{j} \int \int \{T_{j} - \frac{(1 - T_{j}) \sum_{l = 1}^{n} T_{l} K (\frac{x - W_{l}}{b_{n}}) K (\frac{z - Z_{l}}{b_{n}})}{\sum_{l = 1}^{n} (1 - T_{l}) K (\frac{x - W_{l}}{b_{n}}) K (\frac{z - Z_{l}}{b_{n}})}\} K (\frac{x - W_{j}}{b_{n}}) K (\frac{z - Z_{j}}{b_{n}}) d x d z \\ = \int \int \{{\hat{q}}_{1, 1} (x, z) - \frac{{\hat{q}}_{1, 0} (x, z) {\hat{q}}_{0, 1} (x, z)}{{\hat{q}}_{0, 0} (x, z)}\} \frac{1}{\hat{p}} d x d z, \end{matrix}

where (1) follows by change of variables

\tilde{y} = \frac{y - Y_{j}}{b_{n}}

and (2) follows by

\int K (x) d x = 0

and

\int K (x) x d x = 0

. The integrations in (11) and (12) are restricted on

X

and

Z

to emphasize that it is sufficient to integrate over the supports of X and Z.

Appendix B. Proof of Theorems

Appendix B.1. Proof of Theorem 1

To identify

τ

and

τ_{treat}

, it is sufficient to identify

f_{Y, X, Z | T}

as

p (x, z) = \frac{p f_{X, Z | T = 1} (x, z)}{p f_{X, Z | T = 1} (x, z) + (1 - p) f_{X, Z | T = 0} (x, z)}

and

f_{X, Z | T} (x, z) = \int f_{Y, X, Z | T} (y, x, z) d y

. Let

{f_{Y, W, Z | T} (y, \cdot, z)}^{ft} (t) = \int e^{i t w} f_{Y, W, Z | T} (y, w, z) d w

. The identification of

f_{Y, X, Z | T}

follows by

f_{Y, X, Z | T} (y, x, z) = \frac{1}{2 π} \int e^{- i t x} \frac{{f_{Y, W, Z} (y, \cdot, z)}^{ft} (t)}{f_{ϵ}^{ft} (t)} d t,

for which we use the convolution theorem under Assumption 3.

Appendix B.2. Proof of Theorem 2

Define

q (x, z) = f_{X, Z} (x, z)

and

q_{k, s} (x, z) = {m_{X, Z, s}^{k} p_{X, Z, s} f_{X, Z}} (x, z)

for

k, s = 0, 1

, where

m_{X, Z, s} (x, z) = E [Y (s) | X = x, Z = z]

and

p_{X, Z, s} (x, z) = p_{X, Z}^{s} (x, z) {(1 - p_{X, Z} (x, z))}^{1 - s}

. Then, we have

τ = \int_{X} \int_{Z} \{\frac{q_{1, 1} (x, z)}{q_{0, 1} (x, z)} - \frac{q_{1, 0} (x, z)}{q_{0, 0} (x, z)}\} q (x, z) d x d z,

(A1)

τ_{treat} = \int_{X} \int_{Z} \{q_{1, 1} (x, z) - \frac{q_{1, 0} (x, z) q_{0, 1} (x, z)}{q_{0, 0} (x, z)}\} \frac{1}{p} d x d z .

(A2)

First, using

\hat{u} \hat{v} - u v = (\hat{u} - u) v + (\hat{v} - v) u + (\hat{u} - u) (\hat{v} - v)

and

{\hat{v}}^{- 1} - v^{- 1} = - (\hat{v} - v) {v (\hat{v} - v) + v^{2}}^{- 1}

, we have

\begin{matrix} \frac{{\tilde{q}}_{1, s} \tilde{q}}{{\tilde{q}}_{0, s}} - \frac{q_{1, s} q}{q_{0, s}} & = \frac{{({\tilde{q}}_{1, s} - q_{1, s}) q + (\tilde{q} - q) q_{1, s} + ({\tilde{q}}_{1, s} - q_{1, s}) (\tilde{q} - q)} q_{0, s} - ({\tilde{q}}_{0, s} - q_{0, s}) q_{1, s} q}{q_{0, s} ({\tilde{q}}_{0, s} - q_{0, s}) + q_{0, s}^{2}}, \end{matrix}

(A3)

\begin{matrix} \frac{{\tilde{q}}_{1, 1}}{\hat{p}} - \frac{q_{1, 1}}{p} & = \frac{({\tilde{q}}_{1, 1} - q_{1, 1}) p - (\hat{p} - p) q_{1, 1}}{p (\hat{p} - p) + p^{2}}, \end{matrix}

(A4)

\begin{matrix} \frac{{\tilde{q}}_{1, 0} {\tilde{q}}_{0, 1}}{{\tilde{q}}_{0, 0} \hat{p}} - \frac{q_{1, 0} q_{0, 1}}{q_{0, 0} p} & = \frac{\{\begin{matrix} {({\tilde{q}}_{1, 0} - q_{1, 0}) q_{0, 1} + ({\tilde{q}}_{0, 1} - q_{0, 1}) q_{1, 0} + ({\tilde{q}}_{1, 0} - q_{1, 0}) ({\tilde{q}}_{0, 1} - q_{0, 1})} p q_{0, 0} \\ - {(\hat{p} - p) q_{0, 0} + ({\tilde{q}}_{0, 0} - q_{0, 0}) p + (\hat{p} - p) ({\tilde{q}}_{0, 0} - q_{0, 0})} q_{1, 0} q_{0, 1} \end{matrix}\}}{p q_{0, 0} {(\hat{p} - p) q_{0, 0} + ({\tilde{q}}_{0, 0} - q_{0, 0}) p + (\hat{p} - p) ({\tilde{q}}_{0, 0} - q_{0, 0})} + p^{2} q_{0, 0}^{2}}, \end{matrix}

(A5)

where we intentionally suppress the dependence of

\hat{q}

,

{\hat{q}}_{k, s}

, q, and

q_{k, s}

on x and z for

k, s = 0, 1

to keep the notation simple.

For

\tilde{τ}

, note that

\begin{matrix} | \tilde{τ} - τ | = |\int_{X} \int_{Z} \{\begin{matrix} \{\frac{{\tilde{q}}_{1, 1} (x, z) \tilde{q} (x, z)}{{\tilde{q}}_{0, 1} (x, z)} - \frac{q_{1, 1} (x, z) q (x, z)}{q_{0, 1} (x, z)}\} \\ - \{\frac{{\tilde{q}}_{1, 0} (x, z) \tilde{q} (x, z)}{{\tilde{q}}_{0, 0} (x, z)} - \frac{q_{1, 0} (x, z) q (x, z)}{q_{0, 0} (x, z)}\} \end{matrix}\} d x d z| \\ = O (max_{s \in {0, 1}} sup_{(x, z) \in X \times Z} |\frac{{\tilde{q}}_{1, s} (x, z) \tilde{q} (x, z)}{{\tilde{q}}_{0, s} (x, z)} - \frac{q_{1, s} (x, z) q (x, z)}{q_{0, s} (x, z)}|) \\ = O (max_{s \in {0, 1}} \{\begin{matrix} sup_{(x, z) \in X \times Z} | {\tilde{q}}_{1, s} (x, z) - q_{1, s} (x, z) | sup_{(x, z) \in X \times Z} | q_{0, s} (x, z) | sup_{(x, z) \in X \times Z} | q (x, z) | \\ + sup_{(x, z) \in X \times Z} | \tilde{q} (x, z) - q (x, z) | sup_{(x, z) \in X \times Z} | q_{1, s} (x, z) | sup_{(x, z) \in X \times Z} | q_{0, s} (x, z) | \\ + sup_{(x, z) \in X \times Z} | {\tilde{q}}_{1, s} (x, z) - q_{1, s} (x, z) | sup_{(x, z) \in X \times Z} | \tilde{q} (x, z) - q (x, z) | sup_{(x, z) \in X \times Z} | q_{0, s} (x, z) | \\ + sup_{(x, z) \in X \times Z} | {\tilde{q}}_{0, s} (x, z) - q_{0, s} (x, z) | sup_{(x, z) \in X \times Z} | q_{1, s} (x, z) | sup_{(x, z) \in X \times Z} | q (x, z) | \end{matrix}\}) \\ = O (max_{k, s \in {0, 1}} sup_{(x, z) \in X \times Z} | {\tilde{q}}_{k, s} (x, z) - q_{k, s} (x, z) | + sup_{(x, z) \in X \times Z} | \tilde{q} (x, z) - q (x, z) |), \end{matrix}

where the first step follows by (6) and (A1), the second step follows by the compactness of

X

and

Z

(Assumption 4(i)), the third step follows by (A3),

{sup}_{(x, z) \in X \times Z} |{\hat{q}}_{0, s} (x, z) - q_{0, s} (x, z)| = o_{p} (1)

(Lemma 1 and Assumption 4(iv)), and

{inf}_{(x, z) \in X \times Z} |q_{0, s} (x, z)| > 0

(Assumption 4(i)) for

s = 0, 1

, and the last step follows by

{sup}_{(x, z) \in X \times Z} |q (x, z)| < \infty

(Assumption 4(i)),

{sup}_{(x, z) \in X \times Z} |q_{k, s} (x, z)| < \infty

(Assumption 4(i)), and

{sup}_{(x, z) \in X \times Z} |{\hat{q}}_{k, s} (x, z) - q_{k, s} (x, z)| = o_{p} (1)

(Lemma 1 and Assumption 4(iv)) for any

k, s = 0, 1

.

For

{\tilde{τ}}_{treat}

, note that

\begin{matrix} | {\tilde{τ}}_{treat} - τ_{treat} | = |\int_{X} \int_{Z} \{\begin{matrix} \{\frac{{\tilde{q}}_{1, 1} (x, z)}{\hat{p}} - \frac{q_{1, 1} (x, z)}{p}\} \\ - \{\frac{{\tilde{q}}_{1, 0} (x, z) {\tilde{q}}_{0, 1} (x, z)}{\hat{p} {\tilde{q}}_{0, 0} (x, z)} - \frac{q_{1, 0} (x, z) q_{0, 1} (x, z)}{p q_{0, 0} (x, z)}\} \end{matrix}\} d x d z| \\ = O (sup_{(x, z) \in X \times Z} |\frac{{\tilde{q}}_{1, 1} (x, z)}{\hat{p}} - \frac{q_{1, 1} (x, z)}{p}| + sup_{(x, z) \in X \times Z} |\frac{{\tilde{q}}_{1, 0} (x, z) {\tilde{q}}_{0, 1} (x, z)}{\hat{p} {\tilde{q}}_{0, 0} (x, z)} - \frac{q_{1, 0} (x, z) q_{0, 1} (x, z)}{p q_{0, 0} (x, z)}|) \\ = O (\begin{matrix} \begin{matrix} sup_{(x, z) \in X \times Z} | {\tilde{q}}_{1, 1} (x, z) - q_{1, 1} (x, z) | + | \hat{p} - p | sup_{(x, z) \in X \times Z} | q_{1, 1} (x, z) | \\ + sup_{(x, z) \in X \times Z} | {\tilde{q}}_{1, 0} (x, z) - q_{1, 0} (x, z) | sup_{(x, z) \in X \times Z} | q_{0, 1} (x, z) | sup_{(x, z) \in X \times Z} | q_{0, 0} (x, z) | \\ + sup_{(x, z) \in X \times Z} | {\tilde{q}}_{0, 1} (x, z) - q_{0, 1} (x, z) | sup_{(x, z) \in X \times Z} | q_{1, 0} (x, z) | sup_{(x, z) \in X \times Z} | q_{0, 0} (x, z) | \\ + sup_{(x, z) \in X \times Z} | {\tilde{q}}_{1, 0} (x, z) - q_{1, 0} (x, z) | sup_{(x, z) \in X \times Z} | {\tilde{q}}_{0, 1} (x, z) - q_{0, 1} (x, z) | sup_{(x, z) \in X \times Z} | q_{0, 0} (x, z) | \\ + | \hat{p} - p | sup_{(x, z) \in X \times Z} | q_{0, 0} (x, z) | sup_{(x, z) \in X \times Z} | q_{1, 0} (x, z) | sup_{(x, z) \in X \times Z} | q_{0, 1} (x, z) | \\ + sup_{(x, z) \in X \times Z} | {\tilde{q}}_{0, 0} (x, z) - q_{0, 0} (x, z) | sup_{(x, z) \in X \times Z} | q_{1, 0} (x, z) | sup_{(x, z) \in X \times Z} | q_{0, 1} (x, z) | \\ + | \hat{p} - p | sup_{(x, z) \in X \times Z} | {\tilde{q}}_{0, 0} (x, z) - q_{0, 0} (x, z) | sup_{(x, z) \in X \times Z} | q_{1, 0} (x, z) | sup_{(x, z) \in X \times Z} | q_{0, 1} (x, z) | \end{matrix} \end{matrix}) \\ = O (max_{k, s \in {0, 1}} sup_{(x, z) \in X \times Z} |{\tilde{q}}_{k, s} (x, z) - q_{k, s} (x, z)| + | \hat{p} - p |), \end{matrix}

where the first step follows by (7) and (A2), the second step follows by the compactness of

X

and

Z

(Assumption 4(i)), the third step follows by (A4), (A5),

| \hat{p} - p | = o_{p} (1)

(Lemma 1 and Assumption 4(i)),

{sup}_{(x, z) \in X \times Z} | {\tilde{q}}_{0, 0} (x, z) - q_{0, 0} (x, z) | = o_{p} (1)

(Lemma 1 and Assumption 4(iv)),

0 < p < 1

(Assumption 2),

{inf}_{(x, z) \in X \times Z} | q_{0, 0} (x, z) | > 0

(Assumption 4(i)), and the last step follows by

{sup}_{(x, z) \in X \times Z} |q_{k, s} (x, z)| < \infty

(Assumption 4(i)) and

{sup}_{(x, z) \in X \times Z} |{\tilde{q}}_{k, s} (x, z) - q_{k, s} (x, z)| = o_{p} (1)

(Lemma 1 and Assumption 4(iv)) for any

k, s = 0, 1

. The conclusion then follows by implementing Lemma 1.

Appendix C. Lemmas

To facilitate the proof of Theorem 4, we introduce following Lemma, where q,

\hat{q}

,

q_{k, s}

, and

{\hat{q}}_{k, s}

for

k, s = 0, 1

are the same as defined in Appendix B.2.

Lemma 1.

Under Assumptions 3 and 4(i)–(iii), it holds

| \hat{p} - p | = O_{p} (n^{- 1 / 2})

and

\begin{matrix} sup_{(x, z) \in X \times Z} | \hat{q} (x, z) - q (x, z) | & = O_{p} (n^{- 1 / 2} b_{n}^{- 3 / 2} {(inf_{| t | \leq b_{n}^{- 1}} | f_{ϵ}^{ft} (t) |)}^{- 1} + b_{n}^{γ}), \\ max_{k, s \in {0, 1}} sup_{(x, z) \in X \times Z} |{\hat{q}}_{k, s} (x, z) - q_{k, s} (x, z)| & = O_{p} (n^{- 1 / 2} b_{n}^{- 3 / 2} {(inf_{| t | \leq b_{n}^{- 1}} | f_{ϵ}^{ft} (t) |)}^{- 1} + b_{n}^{γ}) . \end{matrix}

Proof.

The first statement follows by

E | \hat{p} {- p |}^{2} = \frac{{E | T |}^{2}}{n} \leq n^{- 1}

. For the rest two statements, we focus on the last one as the second statement can be shown in a similar way.

First, note that

\begin{matrix} E [{\hat{q}}_{k, s} (x, z)] = b_{n}^{- 2} E [Y_{j}^{k} T_{j}^{s} {(1 - T_{j})}^{1 - s} \{\frac{1}{2 π} \int e^{- i t (\frac{x - W_{j}}{b_{n}})} \frac{K^{ft} (t)}{f_{ϵ}^{ft} (t / b_{n})} d t\} K (\frac{z - Z_{j}}{b_{n}})] \\ = b_{n}^{- 2} E [Y_{j}^{k} T_{j}^{s} {(1 - T_{j})}^{1 - s} \{\frac{1}{2 π} \int e^{- i t (\frac{x - X_{j}}{b_{n}})} K^{ft} (t) d t\} K (\frac{z - Z_{j}}{b_{n}})] \\ = b_{n}^{- 2} E [Y_{j}^{k} T_{j}^{s} {(1 - T_{j})}^{1 - s} K (\frac{x - X_{j}}{b_{n}}) K (\frac{z - Z_{j}}{b_{n}})] \\ = \int q_{k, s} (x - b_{n} \tilde{u}, z - b_{n} \tilde{v}) K (\tilde{u}) K (\tilde{v}) d \tilde{u} d \tilde{v} \\ = q_{k, s} (x, z) + O (b_{n}^{γ}), \end{matrix}

where the first step follows by the definition of the deconvolution kernel

K

, the second step follows by the independence between

ϵ

and

(Y, T, X, Z)

(Assumption 3), the third step follows by

K (x) = \frac{1}{2 π} \int e^{- i t x} K^{ft} (t) d t

, the fourth step follows by the change of variables

\tilde{u} = \frac{x - u}{b_{n}}

and

\tilde{v} = \frac{z - v}{b_{n}}

, and the last step follows by the smoothness of

f_{X, Z}

,

p_{X, Z}

, and

E [Y (s) | X, Z]

(Assumption 4(ii)) and properties of the

γ

-th order kernel function K (Assumption 4(iii)).

Also note that

\begin{matrix} V a r ({\hat{q}}_{k, s} (x, z)) \leq \frac{1}{n b_{n}^{4}} E {|Y_{j}^{k} T_{j}^{s} {(1 - T_{j})}^{1 - s} K (\frac{x - W_{j}}{b_{n}}) K (\frac{z - Z_{j}}{b_{n}})|}^{2} \\ = O (\frac{1}{n b_{n}^{4}} {(inf_{| t | \leq b_{n}^{- 1}} | f_{ϵ}^{ft} (t) |)}^{- 2} \int \int \int {|K (\frac{z - v}{b_{n}})|}^{2} {E [Y_{j}^{2 k} T_{j}^{s} {(1 - T_{j})}^{1 - s} | X, Z] f_{X, Z}} (u, v) f_{ϵ} (η) d u d v d η) \\ = O (\frac{1}{n b_{n}^{3}} {(inf_{| t | \leq b_{n}^{- 1}} | f_{ϵ}^{ft} (t) |)}^{- 2} \int \int {|K (\tilde{v})|}^{2} {E [Y_{j}^{2 k} T_{j}^{s} {(1 - T_{j})}^{1 - s} | X, Z] f_{X, Z}} (u, z - b_{n} \tilde{v}) d u d \tilde{v}) \\ = O (\frac{1}{n b_{n}^{3}} {(inf_{| t | \leq b_{n}^{- 1}} | f_{ϵ}^{ft} (t) |)}^{- 2}), \end{matrix}

where the first step follows by the random sampling (Assumption 4(i)), the second step follows by the fact that

K^{ft}

is supported on

[- 1, 1]

(Assumption 4(iii)), the third step uses the change of variables

\tilde{v} = \frac{z - v}{b_{n}}

, and the last step follows by the boundedness of

E [Y_{j}^{2} (s) | X, Z]

for

s = 0, 1

(Assumption 4(i)), the smoothness of

f_{X, Z}

(Assumption 4(ii)), and properties of the

γ

-th order kernel function K (Assumption 4(iii)). □

References

Abadie, Alberto, and Matias D. Cattaneo. 2018. Econometric Methods for Program Evaluation. Annual Review of Economics 10: 465–503. [Google Scholar] [CrossRef]
Battistin, Erich, and Andrew Chesher. 2014. Treatment Effect Estimation with Covariate Measurement Error. Journal of Econometrics 178: 707–15. [Google Scholar] [CrossRef]
Berge, Lars Ivar Oppedal, Kjetil Bjorvatn, and Bertil Tungodden. 2015. Human and Financial Capital for Microenterprise Development: Evidence from a Field and Lab Experiment. Management Science 61: 707–22. [Google Scholar] [CrossRef]
Bissantz, Nicolai, Lutz Dümbgen, Hajo Holzmann, and Axel Munk. 2007. Non-parametric confidence bands in deconvolution density estimation. Journal of the Royal Statistical Society, Series B 69: 483–506. [Google Scholar] [CrossRef]
Bound, John, Charles Brown, and Nancy Mathiowetz. 2001. Measurement Error in Survey Data. In Handbook of Econometrics. Edited by James Heckman and Edward E. Leamer. Amsterdam: Elsevier, vol. 5, pp. 3705–43. [Google Scholar]
Bruhn, Miriam, Dean Karlan, and Antoinette Schoar. 2018. The Impact of Consulting Services on Small and Medium Enterprises: Evidence from a Randomized Trial in Mexico. Journal of Political Economy 126: 635–87. [Google Scholar] [CrossRef]
Carroll, Raymond J., and Peter Hall. 1988. Optimal rates of convergence for deconvolving a density. Journal of the American Statistical Association 83: 1184–86. [Google Scholar] [CrossRef]
Chen, Xiaohong, Han Hong, and Denis Nekipelov. 2011. Nonlinear models of measurement errors. Journal of Economic Literature 49: 901–37. [Google Scholar] [CrossRef]
Cochran, William G., and Donald B. Rubin. 1973. Controlling Bias in Observational Studies: A review. Sankhyā: The Indian Journal of Statistics, Series A (1961–2002) 35: 417–46. [Google Scholar]
Delaigle, Aurore, and Irene Gijbels. 2004. Bootstrap bandwidth selection in kernel density estimation from a contaminated sample. Annals of the Institute of Statistical Mathematics 56: 19–47. [Google Scholar] [CrossRef]
Delaigle, Aurore, and Alexander Meister. 2007. Nonparametric regression estimation in the heteroscedastic errors-in-variables problem. Journal of the American Statistical Association 102: 1416–26. [Google Scholar] [CrossRef]
Delaigle, Aurore, Peter Hall, and Alexander Meister. 2008. On Deconvolution with Repeated Measurements. The Annals of Statistics 36: 665–85. [Google Scholar] [CrossRef]
Delaigle, Aurore, Jianqing Fan, and Raymond J. Carroll. 2009. A design-adaptive local polynomial estimator for the errors-in-variables problem. Journal of the American Statistical Association 104: 348–59. [Google Scholar] [CrossRef] [PubMed]
Delaigle, Aurore, Peter Hall, and Farshid Jamshidi. 2015. Confidence bands in non-parametric errors-in-variables regression. Journal of the Royal Statistical Society, Series B 77: 149–69. [Google Scholar] [CrossRef]
Dong, Hao, Taisuke Otsu, and Luke N. Taylor. 2020a. Estimation of Varying Coefficient Models with Measurement Error. Department of Economics, Southern Methodist University: RePEc. Dallas. [Google Scholar]
Dong, Hao, Taisuke Otsu, and Luke N. Taylor. 2020b. Average Derivative Estimation under Measurement Error. Econometric Theory. in press. [Google Scholar] [CrossRef]
Drexler, Alejandro, Greg Fischer, and Antoinette Schoar. 2014. Keeping It Simple: Financial Literacy and Rules of Thumb. American Economic Journal: Applied Economics 6: 1–31. [Google Scholar] [CrossRef]
Fan, Jianqing. 1991a. On the optimal rates of convergence for nonparametric deconvolution problems. The Annals of Statistics 19: 1257–72. [Google Scholar] [CrossRef]
Fan, Jianqing. 1991b. Asymptotic normality for deconvolution kernel density estimators. Sankhyā: The Indian Journal of Statistics, Series A 53: 97–110. [Google Scholar]
Fan, Jianqing, and Elias Masry. 1992. Multivariate regression estimation with errors-in-variables: Asymptotic normality for mixing processes. Journal of Multivariate Analysis 43: 237–71. [Google Scholar] [CrossRef]
Fan, Jianqing, and Young K. Truong. 1993. Nonparametric regression with errors in variables. The Annals of Statistics 21: 1900–25. [Google Scholar] [CrossRef]
Fan, Yanqin. 1995. Average derivative estimation with errors-in-variables. Journaltitle of Nonparametric Statistics 4: 395–407. [Google Scholar] [CrossRef]
Fernandes, Daniel, John G. Lynch, Jr., and Richard G. Netemeyer. 2014. Financial Literacy, Financial Education, and Downstream Financial Behaviors. Management Science 60: 1861–83. [Google Scholar] [CrossRef]
Fisher, Ronald A. 1935. The logic of inductive inference. Journal of the Royal Statistical Society 98: 39–82. [Google Scholar] [CrossRef]
Frisch, Ragnar. 1934. Statistical Confluence Analysis by Means of Complete Regression Systems. Oslo: University Institute for Economics. [Google Scholar]
Heckman, James J., and Richard Robb, Jr. 1985. Alternative Methods for Evaluating the Impact of Interventions: An overview. Journal of Econometrics 30: 239–67. [Google Scholar] [CrossRef]
Hong, Hwanhee, David A. Aaby, Juned Siddique, and Elizabeth A.Stuart. 2019. Propensity Score–Based Estimators With Multiple Error-Prone Covariates. American Journal of Epidemiology 188: 222–30. [Google Scholar] [CrossRef] [PubMed]
Horowitz, Joel L. 2009. Semiparametric and Nonparametric Methods in Econometrics. Berlin: Springer. [Google Scholar]
Horvitz, Daniel G., and Donovan J. Thompson. 1952. A Generalization of Sampling Without Replacement from a Finite Universe. Journal of the American Statistical Association 47: 663–85. [Google Scholar] [CrossRef]
Imbens, Guido W., and Jeffrey M. Wooldridge. 2009. Recent Developments in the Econometrics of Program Evaluation. Journal of Economic Literature 47: 5–86. [Google Scholar] [CrossRef]
Jakubowski, Maciej. 2010. Latent Variables and Propensity Score Matching: A Simulation Study with Application to Data from the Programme for International Student Assessment in Poland. Empirical Economics 48: 1287–25. [Google Scholar] [CrossRef]
Kato, Kengo, and Yuya Sasaki. 2018. Uniform confidence bands in deconvolution with unknown error distribution. Journal of Econometrics 207: 129–61. [Google Scholar] [CrossRef]
Kato, Kengo, and Yuya Sasaki. 2019. Uniform confidence bands for nonparametric errors-in-variables regression. Journal of Econometrics 213: 516–55. [Google Scholar] [CrossRef]
Koopmans, Tjalling. 1937. Linear Regression Analysis of Economic Time Series. Amsterdam: Econometric Institute, Harrlem-de Erwen F Bohn N.V. [Google Scholar]
Li, Qi, and Jeff Racine. 2003. Nonparametric estimation of distributions with categorical and continuous data. Journal of Multivariate Analysis 86: 266–92. [Google Scholar] [CrossRef]
Lounici, Karim, and Richard Nickl. 2011. Global uniform risk bounds for wavelet deconvolution estimators. The Annals of Statistics 39: 201–31. [Google Scholar] [CrossRef]
Lusardi, Annamaria, and Olivia S. Mitchell. 2014. The Economic Importance of Financial Literacy. Journal of Economic Literature 52: 5–44. [Google Scholar] [CrossRef] [PubMed]
McCaffrey, Daniel F., J. R. Lockwood, and Claude M. Setodji. 2013. Inverse Probability Weighting with Error-Prone Covariates. Biometrika 100: 671–80. [Google Scholar] [CrossRef] [PubMed]
McKenzie, David, and Chrisopher Woodruff. 2013. What Are We Learning from Business Training and Entrepreneurship Evaluations around the Developing World? World Bank Research Observer 29: 48–82. [Google Scholar] [CrossRef]
McMurry, Timothy L., and Dimitris N. Politis. 2004. Nonparametric Regression with Infinite Order Flat-Top Kernels. Journal of Nonparametric Statistics 16: 549–62. [Google Scholar] [CrossRef]
Meister, Alexander. 2009. Deconvolution Problems in Nonparametric Statistics. Berlin: Springer. [Google Scholar]
Morgan, Peter J., and Long Q. Trinh. 2019. Determinants and Impacts of Financial Literacy in Cambodia and Viet Nam. Journal of Risk and Financial Management 12: 19. [Google Scholar] [CrossRef]
Neyman, Jerzy S. 1923. On the Application of Probability Theory to Agricultural Experiments. Essays on Principles. Section 9. Annals of Agricultural Sciences 10: 1–51. [Google Scholar]
Reiersøl, Olav. 1950. Identifiability of a Linear Relation Between Variables Which Are Subject to Error. Econometrica 18: 375–89. [Google Scholar] [CrossRef]
Rosenbaum, Paul R., and Donald B. Rubin. 1983. The Central Role of the Propensity Score in Observational Studies for Causal Effects. Biometrika 70: 41–55. [Google Scholar] [CrossRef]
Rosenbaum, Paul R. 2010. Design of Observational Studies. Springer Series in Statistics. New York: Springer Publishing Company. [Google Scholar]
Roy, A. D. 1951. Some Thoughts on the Distribution of Earnings. Oxford Economic Papers 3: 135–46. [Google Scholar] [CrossRef]
Rubin, Donald B. 1974. Estimating Causal Effects of Treatments in Randomized and Nonrandomized Studies. Journal of Educational Psychology 66: 688–701. [Google Scholar] [CrossRef]
Rubin, Donald B. 1986. Statistical Matching Using File Concatenation with Adjusted Weights and Multiple Imputations. Journal of Business & Economic Statistics 4: 87–94. [Google Scholar]
Rudolph, Kara E., and Elizabeth A. Stuart. 2018. Using Sensitivity Analyses for Unobserved Confounding to Address Covariate Measurement Error in Propensity Score Methods. American Journal of Epidemiology 187: 604–13. [Google Scholar] [CrossRef] [PubMed]
Schennach, Susanne M. 2016. Recent advances in the measurement error literature. Annual Review of Economics 8: 314–77. [Google Scholar] [CrossRef]
Schennach, Susanne M. 2019. Convolution without independence. Journal of Econometrics 211: 308–18. [Google Scholar] [CrossRef]
Stefanski, Leonard A., and Raymond J. Carroll. 1990. Deconvolving kernel density estimators. Statistics 21: 169–84. [Google Scholar] [CrossRef]
VanderWeele, Tyler J., and Onyebuchi A. Arah. 2011. Bias Formulas for Sensitivity Analysis of Unmeasured Confounding for General Outcomes, Treatments, and Confounders. Epidemiology 22: 42–52. [Google Scholar] [CrossRef]
Van Es, Bert, Shota Gugushvili, and Peter Spreij. 2008. Deconvolution for an atomic distribution. Electronic Journal of Statistics 2: 265–97. [Google Scholar] [CrossRef][Green Version]
Webb-Vargas, Yenny, Kara E. Rudoplh, David Lenis, Peter Murkami, and Elizabeth A. Stuart. 2017. An Imputation-Based Solution to Using Mismeasured Covariates in Propensity Score Analysis. Statistical Methods in Medical Research 26: 1824–37. [Google Scholar] [CrossRef]
Wooldridge, Jeffrey M. 2010. Econometric Analysis of Cross Section and Panel Data. Cambridge: MIT Press. [Google Scholar]

1.	See Imbens and Wooldridge (2009) and Abadie and Cattaneo (2018) for excellent surveys.
2.	We assume that the Stable Unit Treatment Value Assumption (SUTVA), where potential outcomes of individuals do not depend on the treatment assignment of others, to hold (Neyman 1923; Rubin 1986).
3.	As $ϵ$ and $ϵ^{r}$ are independent and identically distributed under Assumption 7, we have $E [e^{i t (W - W^{r})}] = E [e^{i t (ϵ - ϵ^{r})}] = {\| f_{ϵ}^{ft} (t) \|}^{2}$ . As $f_{ϵ}$ is symmetric around zero under Assumption 7, $f_{ϵ}^{ft} (t) > 0$ , which implies $f_{ϵ}^{ft} (t) = {\|E [e^{i t (W - W^{r})}]\|}^{1 / 2} = {\|E [cos {t (W - W^{r})}]\|}^{1 / 2}$ . Thus, ${\hat{f}}_{ϵ}^{ft} (t)$ is obtained by plugging in the sample analogue of $E [cos {t (W - W^{r})}]$ .
4.	Data are available at https://www.aeaweb.org/articles?id=10.1257/app.6.2.1.
5.	Note, we do not analyze one outcome included in Drexler et al. (2014). Savings amount is excluded from our analysis as the first-stage propensity score ignoring measurement error did not converge.

Table 1. DGP1.

Known Error Distribution
Estimator	$\tilde{τ}$				${\tilde{τ}}_{treat}$
Error Type	OS		SS		OS		SS
Sample Size	250	500	250	500	250	500	250	500
Bias	0.056	−0.217	−0.316	−0.292	0.131	−0.226	−0.322	−0.297
SD	0.371	0.104	0.126	0.096	0.559	0.109	0.128	0.098
RMSE	0.375	0.241	0.340	0.307	0.575	0.251	0.346	0.313
Unknown Error Distribution
Estimator	$\hat{τ}$				${\hat{τ}}_{treat}$
Error Type	OS		SS		OS		SS
Sample Size	250	500	250	500	250	500	250	500
Bias	0.094	−0.216	−0.315	−0.292	0.185	−0.224	−0.321	−0.297
SD	0.451	0.103	0.127	0.096	0.690	0.108	0.129	0.098
RMSE	0.461	0.239	0.340	0.307	0.714	0.249	0.346	0.313

Table 2. DGP2.

Known Error Distribution
Estimator	$\tilde{τ}$				${\tilde{τ}}_{treat}$
Error Type	OS		SS		OS		SS
Sample Size	250	500	250	500	250	500	250	500
Bias	−0.082	−0.121	−0.170	−0.152	−0.086	−0.138	−0.178	−0.160
SD	0.273	0.103	0.125	0.096	0.400	0.107	0.127	0.098
RMSE	0.285	0.159	0.211	0.180	0.409	0.175	0.219	0.188
Unknown Error Distribution
Estimator	$\hat{τ}$				${\hat{τ}}_{treat}$
Error Type	OS		SS		OS		SS
Sample Size	250	500	250	500	250	500	250	500
Bias	−0.077	−0.120	−0.169	−0.152	−0.064	−0.137	−0.177	−0.160
SD	0.333	0.103	0.126	0.095	0.523	0.108	0.128	0.098
RMSE	0.342	0.158	0.211	0.180	0.527	0.174	0.219	0.188

Table 3. DGP3.

Known Error Distribution
Estimator	$\tilde{τ}$				${\tilde{τ}}_{treat}$
Error Type	OS		SS		OS		SS
Sample Size	250	500	250	500	250	500	250	500
Bias	−0.023	−0.159	−0.240	−0.218	0.002	−0.165	−0.244	−0.221
SD	0.287	0.102	0.124	0.095	0.416	0.107	0.126	0.097
RMSE	0.288	0.189	0.270	0.237	0.417	0.197	0.275	0.242
Unknown Error Distribution
Estimator	$\hat{τ}$				${\hat{τ}}_{treat}$
Error Type	OS		SS		OS		SS
Sample Size	250	500	250	500	250	500	250	500
Bias	−0.008	−0.158	−0.239	−0.217	0.021	−0.163	−0.243	−0.221
SD	0.337	0.102	0.125	0.094	0.538	0.106	0.127	0.097
RMSE	0.338	0.188	0.269	0.237	0.539	0.195	0.274	0.241

Table 4. Impact of Training on Business Practices and Performance.

Dependent Variable	Standard Accounting			Rule-of-Thumb
	OLS	IPW	IPW-ME	OLS	IPW	IPW-ME
Business and Personal Financial Practices
Separate Business and	0.00	0.00	−0.05	0.08	0.08	0.08
Personal Cash	(0.03)	(0.03)	{0.02}	(0.03)	(0.03)	{0.10}
			{{0.14}}			{{0.24}}
	524			532
Keep Accounting Records	0.04	0.04	0.05	0.12	0.12	0.08
	(0.05)	(0.05)	{0.10}	(0.03)	(0.03)	{0.09}
			{{0.25}}			{{0.21}}
	524			533
Separate Business and	0.04	0.04	0.00	0.12	0.12	0.11
Personal Accounting	(0.05)	(0.05)	{0.08}	(0.03)	(0.03)	{0.14}
			{{0.24}}			{{0.25}}
	521			532
Set Aside Cash for	0.07	0.07	0.08	0.12	0.12	0.13
Business Purposes	(0.03)	(0.03)	{0.14}	(0.04)	(0.04)	{0.14}
			{{0.19}}			{{0.23}}
	524			532
Calculate Revenues	0.01	0.01	−0.03	0.06	0.06	0.07
Formally	(0.04)	(0.04)	{0.04}	(0.03)	(0.03)	{0.15}
			{{0.16}}			{{0.23}}
	524			533
Business Practices	0.07	0.07	−0.15	0.14	0.14	0.13
Index	(0.06)	(0.06)	{−0.17}	(0.04)	(0.04)	{0.18}
			{{−0.14}}			{{0.15}}
	525			534
Any Savings	0.01	0.01	−0.03	0.08	0.08	0.05
	(0.05)	(0.05)	{0.01}	(0.04)	(0.04)	{0.04}
			{{0.19}}			{{0.10}}
	529			540

Notes: Sample includes only those individuals with own business and either exposed to the treatment in the column heading or neither treatment. Number of observations beneath results for each model. Standard errors for OLS and IPW are in parentheses and are clustered at the barrio-level. IPW-ME reports only point estimates. The IPW-ME estimates in first row for each outcome assume the variance of the measurement is 1/6 of the variance of the observed covariate; estimates in {} assume the variance of the measurement is 1/3 of the variance of the observed covariate; estimates in {{}} assume the variance of the measurement is 2/3 of the variance of the observed covariate. IPW and IPW-ME estimates are of the average treatment effect.

Table 5. Impact of Training on Business Practices and Performance.

Dependent Variable	Standard Accounting			Rule-of-Thumb
	OLS	IPW	IPW-ME	OLS	IPW	IPW-ME
Objective Reporting Quality
Any Reporting Errors	−0.04	−0.04	−0.09	−0.09	−0.09	−0.15
	(0.04)	(0.04)	{−0.07}	(0.03)	(0.03)	{−0.15}
			{{−0.04}}			{{−0.21}}
	496			508
Raw Profit	918	914	1123	1094	1086	857
Calculation Difference	(746)	(726)	{1158}	(551)	(538)	{925}
(RD$), weekly			{{262}}			{{690}}
	273			289
Absolute Value Profit	−324	−368	−633	−642	−641	−840
Calculation Difference	(643)	(622)	{−595}	(471)	(460)	{−803}
(RD$), weekly			{{−98}}			{{−919}}
	273			289
Business Performance
Total Number	0.08	0.08	0.37	−0.04	−0.04	0.11
of Employees	(0.09)	(0.09)	{−0.02}	(0.09)	(0.09)	{−0.01}
			{{0.61}}			{{0.98}}
	523			533
Revenue Index	−0.02	−0.02	−0.02	0.10	0.10	0.04
	(0.05)	(0.05)	{−0.03}	(0.05)	(0.05)	{0.04}
			{{−0.09}}			{{0.03}}
	511			518
Sales,	−649	−686	−543	604	665	220
Average Week (RD$)	(810)	(791)	{−619}	(942)	(941)	{−480}
			{{1663}}			{{130}}
	367			386
Sales,	−672	−696	−386	1168	1111	389
Bad Week (RD$)	(513)	(497)	{−291}	(538)	(533)	{641}
			{{116}}			{{−35}}
	359			373

Notes: Sample includes only those individuals with own business and either exposed to the treatment in the column heading or neither treatment. Number of observations beneath results for each model. Standard errors for OLS and IPW are in parentheses and are clustered at the barrio-level. IPW-ME reports only point estimates. The IPW-ME estimates in first row for each outcome assume the variance of the measurement is 1/6 of the variance of the observed covariate; estimates in {} assume the variance of the measurement is 1/3 of the variance of the observed covariate; estimates in {{}} assume the variance of the measurement is 2/3 of the variance of the observed covariate. IPW and IPW-ME estimates are of the average treatment effect.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Propensity Score Weighting with Mismeasured Covariates: An Application to Two Financial Literacy Interventions

Abstract

1. Introduction

2. Measurement Error in Covariates

3. Empirics

3.1. Potential Outcomes Framework

3.2. Strong Ignorability

3.3. Strong Ignorability with Measurement Error

3.4. Estimation

3.5. Case of Unknown Measurement Error Distribution

3.6. Inference

4. Simulation

5. Application

6. Conclusions

Author Contributions

Funding

Conflicts of Interest

Appendix A. Derivation of Equations

Appendix A.1. Derivation of (6) and (7)

Appendix A.2. Derivation of (5)

Appendix A.3. Derivation of (11) and (12)

Appendix B. Proof of Theorems

Appendix B.1. Proof of Theorem 1

Appendix B.2. Proof of Theorem 2

Appendix C. Lemmas

References

Article Metrics

Citations

Article Access Statistics