1. Introduction
A frequently encountered complication when estimating the effect of a potentially endogenous treatment based on an instrumental variable (IV) methods is attrition/sample selection/non-response bias in the outcome. To account for this problem, the missing at random (MAR) assumption (e.g.,
Rubin (
1976)), for instance, requires outcome attrition to only depend on observable variables. Alternatively,
Frangakis and Rubin (
1999) propose a latent ignorability (LI) restriction, which assumes attrition to be independent of the outcome conditional on the instrument and the treatment compliance type considered in
Angrist et al. (
1996). A compliance type is defined in terms of how the treatment (e.g., participation in a training) depends on or complies with the value of the instrument (e.g., random assignment to a training), such that the population generally consists of compliers (whose training participation always corresponds to the random assignment) and non-compliers. In the IV framework, we may even combine the MAR and LI assumptions to impose independence intependence between attrition and outcomes conditional on the compliance type, the instrument, and further observed covariates.
We argue that LI is nevertheless quite restrictive, as attrition is not allowed to be related to unobservables affecting the outcome in a very general way. In fact, LI appears hard to justify in quite standard IV models with non-response and should therefore be cautiously scrutinized in applications. As an example, consider
Barnard et al. (
2003), who assess a randomized voucher program for private schooling with noncompliance (where the IV is the randomization and the treatment is private schooling) and attrition in the test score outcomes, because some children did not take the test. Unobservables as ability or motivation likely affect both test taking and test scores. LI (combined with MAR) requires that conditional on the compliance type (i.e. private schooling as a function of voucher receipt), voucher assignment, and observed covariates, test taking is not related to ability or motivation (and thus, test scores). Among compliers (only in private schooling when randomized in), those taking the test must thus have the same distribution of ability and motivation as those abstaining. However, even within compliers, heterogeneity in ability and motivation may be sufficiently high to selectively affect test taking such that LI fails.
As a second example,
Mealli et al. (
2004) as well as
Mattei and Mealli (
2007) consider a randomized trial on teaching breast self-examination (BSE), either based on mailed information (standard treatment) or on attendance in a course (new treatment), to investigate the impact on BSE practice (as a method of breast cancer prevention). However, a substantial share of women assigned to the course did not participate (noncompliance) and furthermore, not all study subjects did answer the follow-up survey on BSE practice (non-response). Unobservables likely affecting both survey response and BSE practice are interest in breast cancer prevention and risk awareness about breast cancer. Also within the subpopulation of compliers, differences in interest and risk awareness could systematically affect response behavior so that LI is violated, even conditional on observed covariates.
The LI or its combination with MAR is invoked in a range of further studies in the fields of medicine, (bio-)statistics, political science, and economics.
O’Malley and Normand (
2005) for instance suggest a maximum likelihood-based estimator and apply it to compare the relative effectiveness of two medical treatments among adults with refractory schizophrenia under treatment non-compliance and outcome attrition.
Chen et al. (
2015) use an LI approach to verify the robustness of their finding that high-calcium milk powder effectively reduces bone loss at the lumbar spine as well as height loss among treatment compliers consisting of postmenopausal women.
Esterling et al. (
2011) suggest a parametric estimator incorporating LI and apply it to measure the effect of participating in a deliberative session with U.S. politicians (about federal immigration and border control policies) on political impressions, e.g., whether public officials care about citizens’ opinions.
Frölich and Huber (
2014) extend the LI framework to multiple outcome periods with increasing attrition across periods and evaluate the effect of a program aiming at increasing college achievement on students’ grade point average in the first and second year. Adapting the LI approach to mediation analysis,
Yamamoto (
2013) disentangles the total treatment effect among compliers into its direct impact on the outcome and an indirect causal mechanism operating via an intermediate variable (or mediator), whose endogeneity is tackled by an LI assumption.
The remainder of this paper is organized as follows.
Section 2 formally discusses the strong behavioral implications of LI in standard IV models with non-response.
Section 3 provides an empirical illustration using the Job Corps experimental study, in which the estimated program effect under LI is compared to alternative assumptions about outcome attrition.
Section 4 concludes.
2. IV Models with Nonresponse
Assume the following parametric IV model with nonresponse:
Y is the outcome of interest,
D is the binary (and potentially endogenous) treatment, and
R is the response indicator. Note that
is the indicator function that is equal to one if its argument is satisfied and zero otherwise.
Y is only observed if
and unknown if
, implying non-response, sample selection, or attrition.
Z is a randomly assigned instrument affecting
D (but not directly
Y or
R) and assumed to be binary, e.g., the randomization indicator in an experiment.
denote arbitrarily associated unobservables,
are coefficients.
Angrist et al. (
1996) define four compliance types, denoted by
T, based on how the potential treatment status depends on the instrument: An individual is a complier (defier) if her potential treatment state is one (zero) in the presence and zero (one) in the absence of the instrument and an always-taker (never-taker) if the potential treatment is always (never) one, independent of the instrument. Assume that
is positive (a symmetric case could be made for a negative
). Then, an individual is a complier if
, an always taker if
, and a never taker if
. Defiers do not exist due to the positive sign of
.
We now impose the following latent ignorability (LI) assumption, see
Frangakis and Rubin (
1999), and critically assess it in the light of our standard IV model with attrition:
Assumption 1 (latent ignorability)
. (where ‘⊥’ denotes independence).
Which is equivalent to as Z and T perfectly determine D. Furthermore, we assume that the error term U is continuous, such that Y is continuous. Finally, for the moment we also impose that such that the same unobservable (e.g., motivation) affects the outcome (e.g., test score), treatment (e.g., private schooling), and response (e.g., test taking).
Note that Assumption 1 implies that the distribution of
U among compliers is the same across response states given the instrument:
where
denotes an arbitrary function with a finite expectation and the second line follows from the parametric model in (
1). Obviously, the joint satisfaction of
and (
2) is impossible in this context, as the distribution of
U conditional on
and
, respectively, is non-overlapping. An analogous impossibility result holds for
, which is also implied by Assumption 1.
Imposing
seems too extreme for most applications and was chosen for illustrative purposes. However, even if the unobserved terms in the various equations are not the same, but non-negligibly correlated as commonly assumed in IV models, identification may seem questionable. Suppose, for instance, that
, where
is random noise and
is a coefficient. Then, Assumption 1 and the model in (
1) imply that
If U is associated with either , V, or both, the latter equality does not hold in general, but only if the association of , V is of a very specific form, which raises concerns about Assumption 1.
Finally, we investigate an in terms of functional form assumptions more general IV model, where
Y,
D, and
R are given by nonparametric functions denoted by
,
, and
, respectively:
Under this model, Assumption 1 implies that
This can be satisfied in special cases, for instance if
, with
denoting the (homogeneous) effect of being a complier and
being random noise. Then, (
5) simplifies to
, which holds because
is independent of
W. In general, identification requires that
T is a sufficient statistic to control for the endogeneity introduced by conditioning on
R. This, however, implies that the association between
U,
V, and
W is quite specific, otherwise Assumption 1 does not hold.
3. Empirical Illustration
As an illustration for treatment evaluation under LI and alternative assumptions about attrition, we consider the experimental evaluation of the U.S. Job Corps program (see for instance
Schochet et al. (
2001)), providing training and education for young disadvantaged individuals. We aim at estimating the effect of program participation (
D) in the first or second year after randomization into Job Corps (
Z) on log weekly wages of females in the third year (
Y). Of the 4765 females in the experimental sample with observed treatment status, wages are only observed for 3682 individuals (
), while 1083 do not report to work.
Reconsidering the IV model of (
4), we assume that in each of
,
, and
a vector of observed covariates, denoted by
X, may enter as additional explanatory variables. Similar to
Frölich and Huber (
2014), Section 2.2, we assume that (i) Assumption 1 holds conditional on
X (thus combining LI and MAR), (ii)
such that the instrument affects the outcome only through the treatment, (iii)
which is implied by random assignment, (iv)
and
so that compliers exist and defiers are ruled out, and (v)
, ensuring common support in the covariates across instrument states.
X (measured prior to randomization) includes education, ethnicity, age and its square, school and working status, and receipt of Aid to Families with Dependent Children (AFDC) and food stamps.
We compare sempiparametric LATE estimation based on the latter assumptions (see Theorem 1 in
Frölich and Huber (
2014)) to (i) MAR-based LATE estimation as in Section 2.3 of
Frölich and Huber (
2014) (assumptions:
,
,
,
,
), (ii) the so-called Wald estimator among those with
which ignores sample selection, and (iii) the method of
Fricke et al. (
2020), which tackles sample selection and treatment endogeneity by two distinct instruments. In the latter approach, which allows for non-ignorable selection related to
U in a more general way than LI, we use the number of kids younger than 6 in the household 2.5 years after random assignment as instrument for
R. We apply a semiparametric version of the estimator outlined in Equation (23) of
Fricke et al. (
2020) along with the weighting function in their expression (21).
Table 1 provides descriptive statistics for the covariates, the treatment, and the instruments in the total sample and for working and not working females. Across the latter groups for instance education, aid receipt, previous job status, and Job Corps participation differ importantly, pointing to non-random selection into employment. In the case that such socioeconomic characteristics also affect the wage outcome, then systematic differences in these variables across the employment states of females generally entail a bias in treatment effect estimation if one does not control for them.
Table 2 presents the effect estimates, standard errors, and
p-values based on 1999 bootstraps using the quantile method. The effect under LI + MAR (based on Theorem 1 of
Frölich and Huber (
2014)) of 0.12 log points virtually identical to the Wald estimator which ignores sample selection bias, and both are statistically significantly different from zero. The MAR-based estimate is one third higher, but not significantly differently so. The method of
Fricke et al. (
2020) based on two instruments (2 IVs) yields virtually the same effect as MAR and is neither statistically significantly different from any other estimator, nor from zero at any conventional level.
It seems important to understand the differences in the behavioral assumptions of the estimators. LI + MAR, for instance, assumes that given the covariates and program assignment, unobservables like ability and motivation do not jointly affect employment and wages among compliers. In contrast, the method of
Fricke et al. (
2020) does not rely on this restriction and allows for more general forms of sample selection, at the cost of also requiring a valid instrument for employment. In our illustration, the results persistently point to a positive wage effect and are therefore rather robust to the different assumptions considered. The fact that LI + MAR, MAR, the approach based on two IVs, and even the Wald estimator (which ignores the selection problem) all yield qualitatively similar estimates may give some confidence to our findings, as the latter are not sensitive to the kind of model imposed on the sample selection process. However, such an agreement among different methods controlling for sample selection or outcome attrition need not necessarily occur in other empirical contexts. For this reason, the plausibility of the alternative sets of assumptions needs to be thoroughly scrutinized in the evaluation problem at hand.